1 / 21

Confidence Intervals in PCA based on Bootstrapping

Confidence Intervals in PCA based on Bootstrapping. CAC–2010. Hamid Babamoradi , Frans van den Berg, Åsmund Rinnan Quality & Technology Group Department of Food Science. Data set. NIR spectra of 2-Propanol and Water mixtures. Wavelength.

halona
Download Presentation

Confidence Intervals in PCA based on Bootstrapping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Confidence Intervals in PCA based on Bootstrapping CAC–2010 Hamid Babamoradi, Frans van den Berg, Åsmund Rinnan Quality & Technology Group Department of Food Science

  2. Data set NIRspectra of 2-Propanol and Water mixtures Wavelength 41 samples (40 thorough 60 mole percent of 2-Propanol) + 2 samples (50%) with 1-Ethanol 2 samples (50%) with 1-Propanol NIR spectra at 30oC (Impure samples) Sample(45)

  3. Data set

  4. Principal Component Analysis (PCA) X = TPT + E Results:

  5. 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 98 98.5 99 99.5 100 100.5 101 101.5 102 10 numbers: xlo xup xLo   = E(x )  xUp xLoand xUpare called Confidence Limits (CLs) x = 100.0

  6. The Bootstrap . . . . . . x*2 = 101.6 x*1000 = 100.9 x = 100.0 x*1 = 99.8 There are 1000 bootstrap estimates of the mean. Wehrens R, Putter H, Buydens LMC. The bootstrap: a tutorial. Chemom. Intel. Lab. Syst. 2000; 54: 35–52

  7. 90 80 70 60 50 Abundance 40 30 20 10 0 98 98.5 99 99.5 100 100.5 101 101.5 102 Bootstrap CIs Percentileis one the methods to build CIs from bootstrap estimates. Order the bootstrap estimates of M: M*1  M*2  . . .  M*999  M*1000 The lower and upper CLs are Bathand B(1-a)th ordered elements. In this case 25th and 975th ordered elements. Bootstrap Percentile CI = [98.85  101.04] 95% confidence level

  8. Bootstrap CIs in PCA PCA X T P PCA X*1 T*1 P*1 PCA ... ... ... X*B T*B P*B PCA

  9. Bootstrap CIs in PCA Re-sampling 1. non-parametric Bootstrap samples are constructed by random replacement from rows (samples) of X. 2. semi-parametric An appropriate PCA model is fitted to X, then re-sampling is done by random replacement from rows of E. 3. parametric Re-sampling procedure is similar to semi-parametric, but a specific distribution is assumed for E.

  10. Bootstrap CIs in PCA Pre-processing For techniques that pre-process samples together, e.g., MSC Order: 1. Re-sampling 2. Pre-processing Mean-centering It must be done after re-sampling and pre-processing.

  11. Bootstrap CIs in PCA X T P Rotational ambiguity PCA X = TPT + E X = TQQ-1PT + E X*1 T*1 P*1 PCA PC1 ... PC1* ... ... X*B T*B P*B PC2 PCA PC2*

  12. Bootstrap CIs in PCA Orthogonal Procrustes Rotation USVT = T*bTT Rotational ambiguity Q = UVT X = TPT + E X = TQQ-1PT + E 1. Rotation using T PC1 PC1* 2. Rotation using P 3. Rotation using a combination of T and P PC2 PC2*

  13. is changed to some function of and Bootstrap CIs in PCA Bootstrap CIs methods 1. Percentile First-order accurate (coverage errors of CIs go to zero at rate 1/n) Transformation respecting (CIs transform correctly if  is changed to f()) 2. Studentized (bootstrap-t) Estimates the distribution of t directly from the data (two nested bootstrap loops). Second-order accurate (rate 1/n) Not transformation respecting 3. Bias-corrected and accelerated (BCa) BCa is the modified version of Percentile method. This method uses two parameters called bias-correction and acceleration. Second-order accurate (rate 1/n) Transformation respecting

  14. One-component PCA model with 95% BCa CIs (B = 1999) 0.4 0.1 PC1 (93.7%) PC1 (93.7%) 0 0 -0.1 -0.4 40 45 50 55 60 40 45 50 55 60 Mole percent of 2-Propanol Mole percent of 2-Propanol 0.2 0.01 CIs for Loading (zero mean) Loading with CIs 0 0 -0.01 -0.2 1170 1180 1190 1200 1210 1170 1180 1190 1200 1210 Wavelength (nm) Wavelength (nm)

  15. Pure spectra -0.2 1-Propanol -0.3 Water Ethanol -0.4 -0.5 2-Propanol 1170 1180 1190 1200 1210 Wavelength (nm) 0.2 0.01 CIs for Loading (zero mean) Loading with CIs 0 0 -0.01 -0.2 1170 1180 1190 1200 1210 1170 1180 1190 1200 1210 Wavelength (nm) Wavelength (nm)

  16. Second Data set NIRspectra of 41 pure samples in 30oC and 40oC Wavelength NIR spectra at 30oC sample(41) NIR spectra at 40oC sample(41)

  17. Conclusions The bootstrap is a potential method to estimate CIs in PCA. Bootstrap ideas generally are hard to implement since they come from pure statistics. There are many options for bootstrapping, but not all of them are real options. A good combination of options could provide reliable CIs.

  18. Thank you for your attention Hamid Babamoradi, Frans van den Berg, Åsmund Rinnan hamba@life.ku.dk Quality & Technology Group Department of Food Science

More Related