Proposed Model for Variable Selection Using Genetic Algorithm

Chimiometrie 2009 Proposedmodel for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr

Variable Selection Genetic Algorithm iPLS 1° step) Models development Obs.: Impossible to estimate the RMSEP because of the need of reference values to X_TST! yref is the reference value yest is the estimate value by the model I is the number of samples

Best Model 2° step) ModelswithVariableSelection

Best Model Optimized The outliers detection in calibration matrix were based on : Extreme Leverages (zero outliers) Unmodeled Residuals in Spectra (zero outliers) Unmodeled Residuals in Dependent Variables (7 outliers) Outliers total in calibration = 7 The outliers detection in validation matrix were based on: Extreme Leverages (129 outliers) Unmodeled Residuals in Spectra (106 outliers) Outliers total in validation = 153 3° step) Outliersdetection for the Best Model

 Based on: Extreme Leverages: Leverage represents how much one sample is distant from the center of the data. According to ASTM E1655-00 , samples with higher than a limit value (hi), should be removed from the calibration set. 3° step) Outliersdetection in calibrationandvalidationmatrix where T represents the scores of all calibration samples, ti is the score vector of a particular sample, and A is the number of latent variables. n = number of samples

  s(êi)>2s(ê) Based on: Unmodeled Residuals in Spectra: Identification of outliers based on unmodeled residuals in spectral data were obtained by comparison of the standard deviation total residuals (s(e)) with the standard deviation of a particular sample (s(ei)): 3° step) Outliersdetection in calibrationandvalidationmatrix If a sample presents s(ei) > 2s(e), the sample should be removed from the calibration set. n = number of samples J = number of variables A = number of latent variables Xi,j = absorbance value of the sample i at wavelength j = estimated value with A latent variables

 Based on: Unmodeled Residuals in Dependent Variables: Outliers are identified through comparison of the root mean square error of calibration (RMSEC) with the absolute error of that sample. If a sample presents a difference between its reference value (yi) and its estimate (yˆi) larger 2 times the RMSEC, it is identified as an outlier 3° step) Outliersdetection in calibrationmatrix n = number of samples J = number of variables A = number of latent variables yi = reference value for the i sample = estimated value for I samples

Accuracy • Fit • Precision – impossible to estimate because of the need of replicates to the validation samples • Sensitivity • Analytical Sensitivity • Selectivity • Linearity • Limit of Detection (LOD) • Limit of Quantification (LOQ) • Signal-to-noise ratio 4° step) Figures ofMerit for the Best ModelOptimized

Accuracy: This parameter reports the closeness of agreement between the reference value and the value found by the calibration model. In chemometrics, this is generally expressed as the root mean square error of calibration (RMSEC) prediction (RMSEP). However, RMSEP is a global parameter that incorporates both systematic and random errors. Hence, an F-test with the RMSEC/RMSEP of two methods is not appropriate to compare the accuracy, a better indicator is the regression of found versus nominal concentrations values and estimation of the linear regression slope and intercept, including the consideration of the elliptical joint confidence regions. 4° step) Figures ofMerit for the Best ModelOptimized The ellipses contain the ideal point (1, 0), for slope and intercept respectively, showing that the reference calibration values and PLS results do not present a significant difference with 99% of confidence.

Fit: 4° step) Figures ofMerit for the Best ModelOptimized Net Analyte Signal Versus Reference Values: Presentation pseudo-univariate of the multivariate calibration model

Sensitivity: This parameter is the fraction of analytical signal due to the increase of the concentration of a particular analyte at unit concentration. = 2.3932x10-5 Analytical Sensitivity: The inverse of this parameter reports the minimum concentration difference between two samples that can be determined by the model, considering that the spectral noise represents the larger source of error. = 0.5737 And the minimum concentration difference between two samples that can be determined by the model is -1 = 1.7431 4° step) Figures ofMerit for the Best ModelOptimized

Selectivity: Signal fraction utilized in the quantification = 0.21 Linearity: in multivariate calibration a liner model should presents errors with alleatory behavior 4° step) Figures ofMerit for the Best ModelOptimized

Limit of Detection: Following IUPAC recommendations, the LOD can be defined as the minimum detectable value of net signal (or concentration). = 5.7518 Limit of Quantification: The ability of quantification is generally expressed in terms of the signal or analyte concentration value that will produce estimatives having a specified standard deviation, usually 10%. = 17.4296 4° step) Figures ofMerit for the Best ModelOptimized

Signal-to-noise ratio: How much the net analyte signal is superior to instrumental noise Max = 26.1264 Min = 9.5815 4° step) Figures ofMerit for the Best ModelOptimized

Proposed Model for Variable Selection Using Genetic Algorithm

Proposed Model for Variable Selection Using Genetic Algorithm

Presentation Transcript

2009

2009

2009

2009

2009

2009

2009

Chimiometrie 2009

2009

2009

2009

PCI 2009 Eureka 2009

ARES-2009 CISIS-2009

2009