430 likes | 566 Views
Chemometric Methods for GC x GC. LCDR Gregory J. Hall Glenn S. Frysinger. Department of Science U.S. Coast Guard Academy New London, Connecticut gregory.hall@uscga.edu. LCDR Gregory J. Hall. 1995 B.S. Marine Science – U.S. Coast Guard Academy 1995 – 1997 Operations Officer, USCGC SPAR
E N D
Chemometric Methods for GC x GC LCDR Gregory J. Hall Glenn S. Frysinger Department of Science U.S. Coast Guard Academy New London, Connecticut gregory.hall@uscga.edu
LCDR Gregory J. Hall 1995 B.S. Marine Science – U.S. Coast Guard Academy 1995 – 1997 Operations Officer, USCGC SPAR 1997-1998 M.S. Chemistry, Tufts University 1998-2000 Rotating Military Faculty, USCGA 2000 – Appointed to the PCTS 2002 – 2004 Ph.D. sabbatical, Tufts University 2006 – Ph.D. Chemistry, Tufts University “Chemometric Characterization and Classification of Estuarine Water through Multidimensional Fluorescence”
Permanent Commissioned Teaching Staff (PCTS) About 23 officers ranked from LT to CAPT Provide the “interpreters” between the military and civilian faculty and leadership for the college Teaching, Service, and Scholarship expected Ph.D. required
What IS Chemometrics? Chemometrics is the chemical discipline that uses mathematical, statistical and other methods employing formal logic to design or select optimal measurement procedures and experiments, and to provide maximum relevant chemical information by analyzing chemical data. (D.L. Massart: Chemometrics:, Elsevier, NY,1988)
Chemometrics already covered and to come • Difference Chromatograms • Property Modeling • Clustering • Chromatograph Prediction • Mass Spec searching • Template Construction • XICs • Retention Indices You are all already chemometricians!
Today • Data Structures – How I view GC x GC data • Variance - PCA • Classification – SIMCA, PCR-DA • Regression – PLS • Peak Resolution - PARAFAC • Preprocessing – Alignment • The way forward, humble opinions
1 2 40 50 67 32 32 25 10 1 2 5 64 90 45 1 18 5 67 10 1 7 41 7 80 23 4 41 50 42 20 Data – GC x GC - FID Second Dimension Intensity Values I Second Dimension J X sample First Dimension K First Dimension Dataset Data Object Chromatogram Stack “Three way” 4 Dimensions Chromatogram “Two way” 3 Dimensions
Data – GC x GC -TOF Second Dimension m/z First Dimension Sample (Date?) Dataset “Four way” 5 Dimensions ! X
Principal Components Analysis (PCA) PC 1 i PC 2 j Q variable 3 T2 variable 1 variable 2
X Principal Components Analysis (PCA) P “components” = + E T Samples data “model” residuals Goal - Variance capture
Multi-way Principal Components Analysis (MPCA) Our data 15 x 410,000 Wise, B. M.; Gallagher, N. B.; Bro, R.; Shaver, J. M.; Windig, W.; Koch, R. S. PLS Toolbox 4.0; Eigenvector Research, Inc.: Wenatchee, WA, 2006.
0 5 10 15 20 25 30 35 40 GC × GC/MS TIC of Fire Debris 4.0 3.0 Time (s) 2.0 1.0 0.0 Time (min) 6 clean carpet samples 5 gasoline samples 6 “doped” carpet samples
PCA Model Specifics • Only two carpet classes included • 4 PCs = 98% variance • Two random samples per class left out, all gasoline samples left out of “training set” • Left out samples “projected” onto the model later.
2.0 1.5 1.0 Time (s) 0.5 0 0 5 10 15 20 25 30 35 40 45 50 Time (min) PC 1 - Loadings Red = positive loadings, correlated Blue = negative loadings, anti-correlated
2.0 1.5 1.0 Time (s) 0.5 0 0 5 10 15 20 25 30 35 40 45 50 Time (min) PC 2 - Loading Chemically interpretable results! Next step - classification
X Regression Vector 2.0 2.0 2.0 1.5 1.5 1.5 Time (s) Time (s) Time (s) 1.0 1.0 1.0 0.5 0.5 0.5 0 0 0 0 0 0 5 5 5 10 10 10 15 15 15 20 20 20 25 25 25 30 30 30 35 35 35 40 40 40 45 45 45 50 50 50 Time (min) Time (min) Time (min) Principal Components RegressionDiscriminant Analysis (PCR-DA) Y PC 1 i PC 2 j Q variable 3 T2 variable 1 variable 2
2.0 1.5 1.0 Time (s) 0.5 0 0 5 10 15 20 25 30 35 40 45 50 Time (min) Regression Vector Red = positive loadings Blue = negative loadings
O Regression Vector Zoom 150 100 30 20 25
Principal Components Regression Predictions Sample Scores on the Regression Vector 1.8 1.6 Gasoline 1.4 1.2 Arson Debris 1 0.8 0.6 0.4 Unaltered Carpet 0.2 0 -0.2 1 6 7 12 17 Discriminant Analysis 1 = Member of Arson Class
x y variable 3 k z variable 1 variable 2 Classification – Soft Independent Model of Class Analogy (SIMCA) PC 1 i PC 2 j variable 3 Q T2 variable 1 variable 2
SIMCA Model Specifics • PCA modeled for 2 classes – Arson , not Arson • Each model had 2 PCs with 99% variance captured • One random samples per class left out, all gasoline samples left out of “training set” • Left out samples “projected” onto each model later.
Arson “Case” SIMCA Results 1 1 In Doped Class In Carpet Class 0 0 Carpet Samples Carpet Doped Gasoline Carpet Doped Gasoline Carpet Test Doped Samples Doped Test Gasoline Test 1 2 Not in any Class Nearest Class 0 1 Carpet Doped Gasoline Carpet Doped Gasoline
Arson “Case” SIMCA Fit Statistics Fit Statistics for Doped Carpet Class Fit Statistics for Carpet Class 0.25 1 0.2 0.8 0.15 0.6 T^2 Residuals T^2 Residuals 0.1 0.4 0.05 0.2 Carpet Samples 0 0 Carpet Test 0 50 100 150 200 250 0 500 1000 Doped Samples Q Residuals Q Residuals Doped Test Gasoline Test 0.03 0.04 0.025 0.03 0.02 T^2 Residuals T^2 Residuals 0.02 0.01 0.015 0 0.01 -0.01 -4 -2 0 2 4 6 8 -10 0 10 20 30 Q Residuals Q Residuals
Parallel Factor Analysis (PARAFAC) I I R B G C I + = K R J X J E J A K K R c1 c2 c3 I I b1 b2 b3 = + + + J E J X a1 a2 a3 K K
c1 b1 a1 Parallel Factor Analysis (PARAFAC) Score Sample Factor 1 Loading Loading I PARAFAC First Dimension Second Dimension J X K Score c2 GC x GC - FID Chromatogram Stack b2 Sample Factor 2 Loading Loading a2 First Dimension Second Dimension
Parallel Factor Analysis (PARAFAC)GC x GC - TOF Sinha, A. E.; Fraga, C. G.; Prazen, B. J.; Synovec, R. E. Journal of Chromatography A2004, 1027, 269-277.
c1 b1 a1 Parallel Factor Analysis (PARAFAC) Score m/z Factor 1 Loading Loading I PARAFAC First Dimension Second Dimension J X K Score c2 GC x GC - TOF Sample b2 m/z Factor 2 Loading Loading a2 First Dimension Second Dimension
Parallel Factor Analysis (PARAFAC)GC x GC - TOF “Complex Environmental Sample” Sinha, A. E.; Fraga, C. G.; Prazen, B. J.; Synovec, R. E. Journal of Chromatography A2004, 1027, 269-277.
PARAFAC Results Sinha, A. E.; Fraga, C. G.; Prazen, B. J.; Synovec, R. E. Journal of Chromatography A2004, 1027, 269-277.
PARAFAC Results Sinha, A. E.; Fraga, C. G.; Prazen, B. J.; Synovec, R. E. Journal of Chromatography A2004, 1027, 269-277.
GCImage screen capture GC × GC/MS Peak Deconvolution PARAFAC? NIJ0221 100 µg 75% Wx gasoline / nylon carpet matrix
X Partial Least Squares (PLS) P variables “latent variables” = + T samples E properties F Q = Y + samples T data “model” residuals
PLS Results Naphthalenes in Jet Fuel Johnson, K. J.; Prazen, B. J.; Young, D. C.; Synovec, R. E. Journal of Separation Science2004, 27, 410-416.
Alignment Strategy 1 • Experimental Design Alignment Strategy 2 • Templates / Peak Tables Alignment Strategy 3 • Retention Index
Alignment Strategy 4 Piecewise Correlation Maximization Pierce, K. M.; Wood, L. F.; Wright, B. W.; Synovec, R. E. Analytical Chemistry2005, 77, 7735-7743.
Alignment Strategy 5 • “Warping” Kaczmarek, K.; Walczak, B.; de Jong, S.; Vandeginste, B. G. M. Journal of Chemical Information and Computer Sciences2003, 43, 978-986.
Alignment Strategy Proposal # 1 • Anchor Warping
Alignment Strategy Proposal # 1 • Anchor Warping
Alignment Strategy Proposal #2 • DTW – Piecewise Hybrid 2nd Dimension Piecewise 1st Dimension DTW Alkanes?
Humble Opinions • GC x GC is tremendously interesting data • Tremendous amounts of work possible, even with data that presently exists. Good alignment will open up even more possibilities • Include the Chemist in the analysis • Include the Chemometrician in the experimental design
Future? • More PCA, PCR, PLS, PARAFAC • Regression certainty calculations • NPLS, NPLS-DA • 4. Holistic, automatic alignment strategies • 2D COW or DTW ? • PARAFAC 2 ? • 5. User driven alignment strategies • Anchor warping • 6. Inclusion on m/z axis • Purity, CODA?
Acknowledgements U.S. Coast Guard Academy Alexander Trust You all!