610 likes | 766 Views
INVESTIGATION OF MAIN CONTAMINATION SOURCES OF HEAVY METAL IONS IN FISH, SEDIMENTS, AND WATERS FROM CATALONIA RIVERS USING DIFFERENT MULTIWAY DATA ANALYSIS METHODS. Emma Peré-Trepat 1 and Romà Tauler 2 *
E N D
INVESTIGATION OF MAIN CONTAMINATION SOURCES OF HEAVY METAL IONS IN FISH, SEDIMENTS, AND WATERS FROM CATALONIA RIVERS USING DIFFERENT MULTIWAY DATA ANALYSIS METHODS Emma Peré-Trepat1 and Romà Tauler 2* 1 Dept. of Analytical Chemistry, Universitat de Barcelona, Diagonal 647, 08028 Barcelona, Spain 2 IIQAB-CSIC, Jordi Girona 18-26, 08034 Barcelona, Spain * e-mail: rtaqam@iiqab.csic.es
Outline: • Introduction and motivations of this work • Environmental data tables and chemometrics models and methods • Example of application: metal contamination sources in fish, sediment and surface water river samples. • Conclusions
Introduction and motivations of this work • Pollution and toxicological chemical compounds are a threat for the environment and the health which need urgent measures and actions • Environmental monitoring studies produce huge amounts of multivariate data ordered in large data tables (data matrices) • The bottle neck in the study of these environmental data tables is their analysis and interpretation • There is a need for chemometrics (statistical and numerical analysis of multivariate chemical data) analysis of these data tables!
What kind of information can be obtained from chemometric analysis of environmental multivariate data tables? • Detection, identification, interpretation and resolution of the main sources of contamination • Distribution of these contamination sources in the environment: geographically, temporally, by environmental compartment (air, water, sediments, biota,...),… • Distinction between point and diffuse contamination sources sources • Quantitative apportionment of these sources .....
Introduction and motivations of this work • In this work different chemometric multiway data analysis • methods are compared for the resolution of the • environmental sources of 11 metal ions in 17 river • samples of fish, sediment and water at the same site • locations of Catalonia (NE, Spain). • Two-way bilinear model based methods • MA-PCA Matrix Augmentation Principal Component Analysis • MA-MCR-ALS Matrix Augmentation Multivariate Curve Resolution Alternating Least Squares • Three-way trilinear models based methods • PARAFAC • TUCKER3 • MCR-ALS trilinear • MCR-ALS TUCKER3
Introduction and motivations of this work • Special attention will be paid to: • Finding ways to compare results obtained using bilinear and trilinear models for three-way data: getting profiles in three modes from bilinear models of three-way data • Adaptation of MCR-ALS to the fulfillment of PARAFAC and TUCKER3 trilinear models • Reliability of solutions: calculation of boundaries of bands of feasible solutions • Integration of Geostatistics and Chemometrics in the investigation of environmental data
Outline: • Introduction and motivations of this work • Environmental data tables and chemometrics models and methods • Example of application: metal contamination sources in fish, sediment and river surface water samples. • Conclusions
Environmental data tables (two-way data) 350 350 300 300 250 250 200 200 150 150 100 100 50 50 0 0 -50 -50 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 40 45 50 J variables Conc. of chemicals Physical Properties Biological properties Other ..... <LOD Data table or matrix I samples 12 13 45 67 89 42 35 0 0.3 0.005 111 33 5 67 90 0.06 44 33 1 2 X ‘m’ Plot of variables (columns) Plot of samples (rows)
Environmental three-way data sets Measured data usually consisted on concentrations of different chemical compounds (variables) measured in different samples at different times/situations/conditions/compartments. Data are ordered in a two-way or in a three-way data table according to their structure 3-way data sets time/ compartment • Three measurement modes • - variables mode • sample mode • times/situations/conditions/ compartments mode samples variables (conc. Chemical ompounds)
Chemometric models to describe environmental measurements • Models for what? • Models for: • identification of contamination sources? • exploration of contamination sources? • interpretation of contamination sources? • resolution of environmental source? • apportionment/quantitation of environmental source? • ??????..............................
Chemometric models to describe environmental measurements Bilinear models for two way data: J dij I D dijis the concentration of chemical contaminant j in sample i n=1,...,N are a reduced number of independent environmental sources xin is the amount of source n in sample i; ynjis the amount of contaminant j in source n
Chemometric models to describe environmental measurements Bilinear models for two way data: J J J YT N D E X I + I I N << I or J N PCA X orthogonal, YT orthonormal YT in the direction of maximum variance Unique solutions but without physical meaning Identification and Intereprtation! MCR-ALS X and YTnon-negative X or YT normalization other constraints (unimodality, local rank,… ) Non-unique solutions but with physical meaning Resolution and apportionment!
Chemometric models to describe environmental measurements Extension of Bilinear models for simultaneous analysis of multiple two way data sets YT Xaug Dk Xk (n,J) YT (I x J) (I,n) Xk Dk PCA: orthogonality; max. variance MCR: non-negativity, nat. constraints Matrix augmentation strategy Daug YT Dk Xk (n,J) (I x J) (I,n)
Chemometric models to describe environmental measurements i=1,...,I k=1,...,K j=1,...,J Trilinear models for three-way data: Dk dijkis the concentration of chemical contaminant j in sample I at time (condition) k n=1,...,N are a reduced number of independent environmental sources xin is the amount of source n in sample i; ynjis the amount of contaminant j in source n znk is the contribution of source n to compartment k
variables Nj Nk Ni Z-mode Z X-mode samples X Y D K conditions (I , J , K) I J Y-mode Three Way data models
Z X YT = D PARAFAC (trilinear model) The same number of components In the three modes: Ni = Nj = Nk = N No interactions between components Different slices Xk are decomposed In bilinear profiles having the same shape!
Z G YT = • Different number of components • in the different modes Ni Nj Nk • Interaction between components • in different modes is possible X D Tucker3 models In PARAFAC Ni = Nj = Nk = N and core array G is a superdiagonal identity cube
Guidelines for method selection (resolution purposes) Deviations from trilinearity Mild Medium Strong Array size PARAFAC SmallPARAFAC2 Medium TUCKER Large MCR, PCA, SVD,.. Journal of Chemometrics, 2001, 15, 749-771
INTEGRATION OF CHEMOMETRICS—GEOSTATISTICS (Geographical Information Systems, GIS)
Outline: • Introduction and motivations of this work • Environmental data tables • Chemometrics bilinear and trilinear models and methods • Example of application: metal contamination sources in fish, sediment and river surface water samples. • Conclusions
1 2 3 6 5 4 7 17 9 8 10 11 12 13 14 15 16 METAL CONTAMINATION SOURCES IN SEDIMENTS, FISH AND WATERS FROM CATALONIA RIVERS USING MULTIWAY DATA ANALYSIS METHODS Emma Peré-Trepat (UB), Mónica Flo, Montserrat Muñoz, Antoni Ginebreda (ACA), Marta Terrado, Romà Tauler (CSIC) France Pyrinees 1. RIU MUGA Castelló d´Empúries J052 2. RIU FLUVIÀ Besalú J022 3. RIU FLUVIÀ L´Armentera J011 4. RIU TER Manlleu J034 5. RIU TERRI Sant Julià de Ramis J028 6. RIU TER Clomers J112 7. RIU TORDERA Fogars de Tordera J062 8. RIU CONGOST La Garriga J037 9. RIU LLOBREGAT El Pont de Vilomara J031 10. RIU CARDENER Castellgali J002 11. RIU LLOBREGAT Abrera J084 12. RIU LLOBREGAT Martorell J005 13. RIU LLOBREGAT Sant Joan Despí J049 14. RIU FOIX Castellet J008 15. RIU FRANCOLÍ La Masó J059 16. RIU EBRE Flix J056 17. RIU SEGRE Térmens J207 Aragón Barcelona Mediterranean Sea 17 rivers, 11 metals (As, Ba, Cd, Co, Cu, Cr, Fe, Mn, Ni, Pb, Zn), 3 environmental conpartments: Fish (barb’, ‘bagra comuna’, bleak, carp and trout), Sediment and Water samples
Missing data (‘m’) • Unknown values produce empty holes in data matrices • When they are few and they are evenly distributed, they • may be estimated by PCA imputation (or other method) • Below LOD values (<LOD) • This a common problem in environmental data tables • If most of the values are below LOD, data matrices are sparse • For calculations, it is better, either to use the experimental values or set them to LOD/2 instead of to zero or to LOD
Preliminary data description: Use of descriptive statistics • Individual sample plots • Individual variable plots • Descriptive statistics (Excel Statistics) • Histograms/Box plots • Binary correlation between variables • 5) ............................................................. ** 300 250 200 Values 150 100 *** 50 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Column Number outliers upper whisker upper quartile median lower quartile lower whisker outliers
Effect of different data pre-treatments: Sediment samples raw mean- centred auto- scaled scaled Mo is eliminated As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn
Data Pretreatment • No mean-centering was applied to allow an improved physical interpretation of factors (application of non-negativity constraints instead of orthogonality constraints) and the comparison of results using MCR-ALS methods • Two scaling possibilities: • First, data matrix augmentation and then column scaling to equal variance (each column element divided by its standard deviation) • First, column scaling each data matrix separately and then data matrix augmentation • Variables with nearly no-changes and equal or close to their limit of detection were removed from scaling and divided by 20 (to avoid their miss-overweighting)
Description of scaled data Metal distribution in the three compartments Cd, Co and Ld in water were not scaled; only downweigthed metals (variables)
Description of scaled data: different sites in the three compartments Llobregat Tordera Segre Ter Llobregat Foix Congost Cardener Fluvià Muga Llobregat Terri Ebre Francolí Ter Fluvià Llobregat sample sites
Unit variance scaled concentrations boxplot Fish 4 Values 2 0 1 2 3 4 5 6 7 8 9 10 11 Sediment 4 Values 2 0 1 2 3 4 5 6 7 8 9 10 11 6 Water 4 Values 2 0 1 2 3 4 5 6 7 8 9 10 11 As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn
Fish Fish Sediment Water Sediment Water compartments sites AUGMENTATION direction column row tube s1 40.2619 43.2553 41.3302 s2 16.7504 9.2823 19.4850 s3 9.4963 8.5312 14.3739 contaminants Fish Sediment Water SVD odf augmented data matrices in the three-directions 45 40 svd column-wise (variables) svd row-wise(samples) 35 svd trube-wise (type) 30 2nd component 25 THREE-WAY DATA ARRAY MATRICIZING or MATRIX AUGMENTATION 20 15 10 How many components are needed to explain each mode? 5 0 0 1 2 3 4 5 6 7 8 9 10
compartments sites metals Bilinear modelling of three-way data (Matrix Augmentation or matricizing, stretching, unfolding ) MA-PCA MA-MCR-ALS contaminants Y sites 4 F 1 F Loadings S W 5 S 2 sites sites 6 W 3 Daug Xaug Augmented scores matrix Augmented data matrix
Explained variances using bilinear models (profiles in two modes)
As As Ba Ba Cd Cd Co Co Cu Cu Cr Cr Fe Fe Mn Mn Ni Ni Pb Pb Zn Zn metals metals MA-PCA of scaled data without scores refolding 10 8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 5 water samples 0 sediment and fish samples Ba As Cu Zn -5 0 5 10 15 20 25 30 35 40 45 50 water soluble metal ions MA-PCA
As As Ba Ba Cd Cd Co Co Cu Cu Cr Cr Fe Fe Mn Mn Ni Ni Pb Pb Zn Zn metals metals MA-MCR-ALS of scaled data with nn and without scores refolding 10 sediment and fish samples Ba 8 Zn Cu 6 As 4 2 0 0 5 10 15 20 25 30 35 40 45 50 10 8 water samples 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 More easily Interpretable!!! MA-MCR-ALS MA-PCA
Calculation of the boundaries of feasible band solutions (Journal of Chemometrics, 2001, 15, 627-646) max min Nearly no rotation ambiguities are present in non-negative environmental profiles calculated by MCR-ALS (very different to spectroscopy!!!!!)
Bilinear modelling of three-way data (Matrix Augmentation or matricizing, stretching, unfolding ) Xaug contaminants Y sites F 1 4 F S PCA MCR-ALS W 5 S 2 sites contaminants X Y sites 6 W 3 sites xi xii Z zi zii D compartments (F,S,W) zi compartments SVD sites 1 2 3 xi zii contaminants SVD 4 5 6 xii Scores refolding strategy!!! (applied only to final augmented Scores) Loadings recalculation in two modes from augmented scores
Explained variances using trlinear models (profiles in three modes)
0.5 0.4 0.3 0.2 0.1 0 As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn metals 0.5 0 -0.5 As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn metals MA-PCA of scaled data with nn and scores refolding Little differences in samples mode!!! MA-PCA + refolding MA-PCA
MA-MCR-ALS of scaled data with scores refolding MA-MCR-ALS + refolding MA-MCR-ALS
Z compartments (F,S,W) metals F metals compartments (F,S,W) Y S W PARAFAC sites sites D X compartments sites contaminants Trilinear modelling of three-way data
PARAFAC of scaled data PARAFAC MA-PCA (bilinear)
MA-MCR-ALS Trilinear constraint compartments sites contaminants Xaug contaminants Y sites F 1 contaminants F X Y S W S MCR-ALS 2 sites sites Z compartments (F,S,W) sites W 3 D Substitution of species profile Selection of species profile TRILINEARITY CONSTRAINT (ALS iteration step) 1 1’ This constraint is applied at each step of the ALS optimization and independently for each component individually Rebuilding augmented scores SVD Folding 2 2’ Loadings recalculation in two modes from augmented scores every augmented scored wnated to follow the trilinear model is refolded 3 3’
10 8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 10 8 As As Ba Ba Cd Cd Co Co Cu Cu Cr Cr Fe Fe Mn Mn Ni Ni Pb Pb Zn Zn metals 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 MA-MCR-ALS of scaled data with nn, trilinearity (without scores refolding) MA-MCR-ALS nn + trilinear MA-MCR-ALS nn
Calculation of the boundaries of feasible band solutions (Journal of Chemometrics, 2001, 15, 627-646) No rotation ambiguities are present in trilinear non-negative environmental profiles calculated by MCR-ALS (very different to spectroscopy!!!!!)
MA-MCR-ALS of scaled data with nn, trilinearity and with scores refolding MA-MCR-ALS nn + trilinear PARAFAC nn
Z compartments (F,S,W) F metals compartments (F,S,W) metals S 2 Y 2 1 W TUCKER3 2 = sites 1 D sites 2 G Model (1,2,2) X compartments sites metals Tucker3 modelling of three-way data
Tucker Models with non-negativity constraints [2 3 3] [3 3 3] [1 3 3] [3 2 3] [2 2 2] [2 2 3] [1 2 2] [1 2 3] parsimonious model [1 2 2]
Tucker3 of scaled data 0.4 1 1 0.2 0.5 0.5 0 0 0 0 5 10 15 1 2 3 4 5 6 7 8 9 10 11 1 2 3 1 1 0.5 0.5 0 0 1 2 3 4 5 6 7 8 9 10 11 1 2 3 TUCKER3 PARAFAC model [1 2 2] model [2 2 2]