360 likes | 445 Views
WSC2, Barnaul, March 2003. Environmental Applications of Chemometrics. Pentti Minkkinen Lappeenranta University of Technology. e-mail: Pentti.Minkkinen@lut.fi. General.
E N D
WSC2, Barnaul, March 2003 Environmental Applications of Chemometrics Pentti Minkkinen Lappeenranta University of Technology e-mail: Pentti.Minkkinen@lut.fi
General • Many environmental data sets (problems) are a challenge to a data analyst: multivariate, long time series, missing values, new analytical methods adopted • Needs: Data compression, visualization, modeling • Problem types: classification, process modeling, monitoring and detection of trends , new information from old data • Standard chemometric methods, PCA, PLS and DPLS can be used for many different problems
Contents • Dependence of emission of diesel engine on its running speed and load • Effect of exposure to vanadium dust in industrial environment • Effects of industrial effluents in the recipient lake • Multivariate study on urban aerosol samples • Periodicities of the surface level fluctuation of two large Finnish lakes
Emissions of a Diesel Engine A=1600 rpm; B =2600 rpm Ji Ping Shi et al. Environ.Sci & Techn. 34 (No. 5,2000) 748-755
SO412.97 2.70 -30.34 -36.40 3.65 0.007 -18.42 NO3 2.67 0.93 -1.54 -1.85 -5.14 0.004 -3.71 ORG -30.34 -1.54 141.68 169.99 -97.92 0.02 42.33 OCO -36.40 -1.85 169.99 203.96 -117.49 0.03 50.80 elC 3.65 -5.14 -97.92 -117.49 121.15 -0.04 2.81 PAH 0.007 0.004 0.02 0.03 -0.04 0.000 -0.001 TOT -18.42 -3.71 42.33 50.80 2.81 -0.002 33.18 cov(X) = SO41.00 0.78 -0.71 -0.71 0.09 0.31 -0.89 NO3 0.78 1.00 -0.13 -0.13 -0.49 0.67 -0.67 ORG -0.71 -0.13 1.00 1.00 -0.75 0.30 0.62 OCO -0.71 -0.13 1.00 1.00 -0.75 0.30 0.62 elC 0.09 -0.49 -0.75 -0.75 1.00 -0.54 0.04 PAH 0.31 0.67 0.30 0.30 -0.54 1.00 -0.04 TOT -0.89 -0.67 0.62 0.62 0.04 -0.04 1.00 corcoef(X)= Covariance and correlation matrices of X
250 200 tot 150 PAH elC 100 oco 50 orgC NO3 SO4 0 100 50 25 100 50 25 2600 rpm 1600 rpm Diesel: Variables OAT
6 50 A25 B25 A100 4 40 A50 NO3 org A25 A50 2 30 B100 B50 B25 B50 B100 A100 0 20 0 5 10 15 0 5 10 15 60 60 A25 B100 B25 B50 A50 40 40 elC oco B25 A100 B50 A50 B100 A100 A25 20 20 0 5 10 15 0 5 10 15 0.04 90 A100 B25 A25 B25 A25 B100 B50 0.03 80 PAH tot A50 A50 B100 B50 A100 0.02 70 0 5 10 15 0 5 10 15 SO4 SO4 Two at a time
50 60 A25 A25 B25 B25 40 A50 A50 40 org oco 30 B50 B50 B100 A100 B100 A100 20 20 1 2 3 4 5 1 2 3 4 5 60 0.04 A100 B25 A25 B100 B50 40 0.03 PAH elC B25 A50 A100 B100 B50 A50 A25 20 0.02 1 2 3 4 5 1 2 3 4 5 NO3 90 B25 A25 B100 B50 80 tot A50 A100 70 1 2 3 4 5 NO3
60 60 A25 B100 50 50 B25 B50 A50 40 40 elC oco B25 A100 30 30 B50 A50 B100 A100 A25 20 20 20 30 40 50 20 30 40 50 org org 0.04 90 A100 B25 A25 B25 A25 0.035 85 B100 B50 PAH 0.03 80 tot A50 A50 B100 0.025 75 B50 A100 0.02 70 20 30 40 50 20 30 40 50 org org
60 0.04 A100 B25 A25 B100 50 0.035 B50 PAH 40 0.03 elC B25 A50 A100 B100 30 0.025 B50 A50 A25 20 0.02 20 30 40 50 60 20 30 40 50 60 oco oco 90 B25 A25 85 B100 B50 tot 80 A50 75 A100 70 20 30 40 50 60 oco
0.04 90 A100 B25 A25 A25 B25 0.035 85 B100 B50 PAH tot 0.03 80 A50 A50 B100 0.025 75 B50 A100 0.02 70 20 30 40 50 60 20 30 40 50 60 elC elC 90 B25 A25 85 B100 B50 tot 80 A50 75 A100 70 0.02 0.025 0.03 0.035 0.04 PAH
2.80 -2.12 0.39 -0.35 -0.48 -1.40 -2.25 -1.29 0.15 1.12 1.89 0.34 0.63 1.81 -0.24 -1.95 0.19 0.76 T= 0.47 0.24 -0.49 -0.49 0.24 -0.04 -0.43 -0.26 -0.53 -0.21 -0.21 0.50 -0.510.22 0.060.01 -0.12 -0.12 0.43 0.71 0.53 P’= 52.8 90.7 99.0 R2 = PCA: X=T P’, A=3 Object scores Variable loadings Index of determination
0.6 2 B100 B50 elC 1.5 0.4 1 P2 (90.7 %) tot 0.2 0.5 0 B25 0 T2 (90.7 %) -0.2 oco org SO4 A50 -0.5 -0.4 -1 PAH NO3 A25 -0.6 -1.5 P1 (52.8 %) -2 -0.8 A100 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 -2.5 -3 -2 -1 0 1 2 3 T1 (52.8 %) Scores and loadings: PC1 vs. PC2
Biplot of scores and loadings 3 elC 2 B100 B50 tot 1 T2, P2 (90.7 %) B25 0 A50 -1 oco org A25 SO4 -2 A100 PAH NO3 -3 -3 -2 -1 0 1 2 3 T1, P1 (52.8 %)
1 B100 0.5 B25 B50 A100 0 T3 -0.5 A25 -1 -1.5 2 A50 1 3 0 2 T2 1 -1 0 -2 -1 T1 -2 -3 -3 3-D graph of the scores (99 % variance explained)
1600 100 1600 50 1600 25 2600 100 2600 50 2600 25 Y= Can we make a predictive model? Given X (emissions) can we predict Y (engine speed and load)? Inverse calibration problem for PLS
X PLSPartial Least Squares or Projection to Latent Structure ? y1yn x1 xk Y X Y = UQ’ +F X = TP’ +E y3 U = T d + G x3 Y ui t i ui y1 x1 y2 x2 ti bpls = W (P' W)-1 Q'
PLS results between autoscaled LOG(X) and autoscaled Y Percent Variance Captured by PLS Model -----X-Block----- -----Y-Block----- LV # This LV Total This LV Total ---- ------- ------- ------- ------- 1 54.15 54.15 43.21 43.21 2 35.23 89.38 45.29 88.50 3 9.03 98.41 3.89 92.39
Diesel, PLS model 0.6 0.6 rev elC elC 0.4 0.4 tot tot 0.2 0.2 0 0 Load W2, Q2 W2, Q2 org oco org oco - - 0.2 0.2 SO4 SO4 PAH PAH - - 0.4 0.4 - - 0.6 0.6 NO3 NO3 - - 0.8 0.8 - - 0.8 0.8 - - 0.6 0.6 - - 0.4 0.4 - - 0.2 0.2 0 0 0.2 0.2 0.4 0.4 0.6 0.6 W1, Q1 W1, Q1 Biplot of PLS loading weights (W) and Y variable loadings
0.5 0.5 0.3 0.3 0.4 0.4 0.2 0.2 0.3 0.3 0.1 0.1 0.2 0.2 org org oco oco tot tot 0 0 0.1 0.1 NO3 NO3 SO4 SO4 org org oco oco SO4 SO4 NO3 NO3 elC elC PAH PAH 0 0 - 0.1 - 0.1 elC elC PAH PAH tot tot - 0.1 - 0.1 - 0.2 - 0.2 - - 0.2 0.2 0.3 - - 0.3 - - 0.3 0.3 - - 0.4 0.4 7 1 1 2 2 3 3 4 4 5 5 6 6 7 - - 0.4 0.4 1 1 2 2 3 3 4 4 5 5 6 6 7 7 ENGINE REVOLUTIONS ENGINE LOAD Regression coefficients from PLS
Diesel: PLS prediction Diesel: PLS Diesel: PLS prediction 2800 2800 110 110 A100 B100 100 100 2600 2600 B50 B25 90 90 B100 80 80 2200 2200 70 70 B50 B50 60 60 PREDICTED LOAD 50 50 1800 1800 A50 A50 A25 A25 40 40 A100 A100 30 30 A50 A50 B25 B25 A25 A25 1400 1400 20 20 1600 1600 1700 1700 1800 1800 1900 1900 2000 2000 2100 2100 2200 2200 2300 2300 2400 2400 2500 2500 2600 2600 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 MEASURED LOAD MEASUDERED REV PREDICTED REV 100 100
Clinical effects of exposure to vanadium dust Data measured by Lauri Pyy et al. 26 clinical variables on blood serum measured on two matched groups: Test group (18 persons exposed to vanadium dust in V2O5 factory and control group (17 persons not exposed to vanadium dust)
30 Prot 25 BSP IGM IGG IGA IGE 20 gCT Fe Trig Chol LD 15 Asat Variable No. Alat Afos B-cf Bil 10 PI Ca Urat Urea Crea 5 Na K Cl Alb Gluc 0 0 2 4 6 8 10 12 Scaled concentrations Exposure to vanadium - comparison by variables OAT
5 4 3 V V V V V 2 V V V V V 1 V V V C C V V C 0 T2 (29.9 %) C V C V V C C -1 C C C -2 C C C C C C -3 C -4 -5 -5 0 5 T1 (17.5 %) PCA SCORE PLOT
DUMMY MATRIX Y DESCRIPTOR MATRIX X CLASS 1 PLS CLASS 2 Construction of the dummy or indicator matrix for DPLS (PLS discriminant analysis) which is used to find the projections of Xspace that discriminate best the classes of the training set.
3 C 2 C C C C C V C 1 V C V V C V V C 0 T2 C V V V C C V C V V -1 C V V V V C V C V -2 -3 -5 -4 -3 -2 -1 0 1 2 3 4 T1 Percent Variance Captured by PLS Model -----X-Block----- -----Y-Block----- LV # This LV Total This LV Total ----- -------- ------- ------- ------- 1 11.811.976.2 76.2 2 5.617.4 13.3 89.5 D-PLS SCORE PLOT
2.5 C 2 C Urea C 1.5 PI C C C Afos Crea V C 1 LD K Asat V C V IGG gCT V 0.5 Alat Prot Alb C V V C Chol Na IGE IGM Trig T2, W2 0 C Ca V V V Gluc Bil C C BSP Cl -0.5 V C V V -1 C V V V IGA V C V -1.5 C B-cf Urat V -2 Fe -2.5 -4 -3 -2 -1 0 1 2 3 T1, W1 BIPLOT OF THE D-PLS MODEL
Effect of Industrial Effluents on Trace Element Patterns of Aquatic Plants Field work: Jukka Särkkä Analyses: Inkeri Yliruokanen
1 = Jyväsjärvi: Industrial and municipal effluents, follow-up after remedial measures 2 = Tiirinselkä: heavily polluted by industrial effluents 3 = Judinsalonselkä:Intermediate zone 4 = Tehinselkä: Clean Area Study Area in Lake Päijänne: Aquatic plants (3 species of Nympheids) collected in two summers, follow up in Site 1. Samples were analysed for Ash % and 13 minor and trace elements.
SPECIES: A and a = Potamogeton natans, B and b = Polygonum amphibium, C and c = Nuphar luteum
CLASSIFICATION ACCORDING TO SPECIES 1 B4 A4 B3 br 0.5 A3 B3 C4 B4 A4 B1 ar br C4 A3 0 B1 ar A1 T2 (69.5 %) cr A2 C3 cr cr -0.5 C3 A2 C2 C2 C1 -1 -1.5 -2 -1 0 1 2 3 T1 (49.1 %)
Nympheids Nympheids 0.6 0.6 0.4 Sr Zn 0.4 ar 0.2 ar Ba A3 La Mn Ce A3 0 Cu C3 Pr A1 0.2 P3 A% Pb A2 -0.2 A4 T3 br cr 0 C3 C4 A2 -0.4 Y Rb A4 cr V Fe -0.6 C2 -0.2 C1 C2 -0.8 C4 -1 B4 B3 br cr B1 -0.5 0.6 -0.4 0.4 B4 0 0.2 0 B1 0.5 -0.2 P2 P1 -0.6 B3 -1 4 0 2 0 SAMPLESCORES VARIABLE LOADINGS T2 1 -2 T1
EFFECT OF REMEDIAL MEASURES ON JYVÄSJÄRVI (Sites 1 and r) 1.5 1 A4 0.5 C4 B4 A3 B3 A4 br br B1 B3 ar B4 A3 0 B1 A1 ar C4 cr C3 T2 (85.5 %) cr A2 A2 cr C3 -0.5 C2 C2 -1 -1.5 C1 -2 -3 -2 -1 0 1 2 3 4 T1 (65.3 %) EFFECT OF REMEDIAL MEASURES ON JYVÄSJÄRVI (Site 1 and 1r): DPLS model with Site 1 and Site 4 objects, other objects fitted to this model
2 cr 2 cr 1.5 1.5 br B4 1 1 C4 ar C2 ar C2 br 0.5 0.5 C1 C3 B3 C4 0 0 C3 A4 A4 B4 B3 -0.5 -0.5 B1 A3 B1 -1 A3 -1 A1 A2 -1.5 -1.5 -2 -2 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 T2 T1 EFFECT OF REMEDIAL MEASURES ON JYVÄSJÄRVI (Site 1 and 1r): DPLS model with Site 1 and Site 4 objects, other objects fitted to this model (different scaling from previous figure – information still the same)