1 / 37

Critical levels in projection

Critical levels in projection. Alexey Pomerantsev Semenov Institute of Chemical Physics, Moscow. Projection approach. OD : distance to the model. SD : distance within the model. Scores & Orthogonal Distances. PLS/PCR Influence plot. SIMCA Classification. MSPC. Where applied.

gaerwn
Download Presentation

Critical levels in projection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Critical levels in projection Alexey PomerantsevSemenov Institute of Chemical Physics, Moscow WSC-6

  2. Projection approach WSC-6

  3. OD: distance to the model SD: distance within the model Scores & Orthogonal Distances WSC-6

  4. PLS/PCR Influence plot SIMCA Classification MSPC Where applied WSC-6

  5. Giants battle at ICS-L, April 2007 The ratios of residual variances of PCA are fairlywell F-distributed. This is easy - the shape of the distribution of a ratio of two variances usually looks like an F. Svante Wold No, the residuals from PCA don't follow an F-distribution unless you fuss with the degrees of freedom, and there are better alternatives in any case. Barry Wise WSC-6

  6. Pt K = I × I J X J K T Full PCA Decomposition K=rank(X) ≤ min (I, J) X=TPtL=TtT=diag(l1,.., lK) WSC-6

  7. A t I EA J + I = I × PA J J A X TA A≤ K Truncated PCA Decomposition WSC-6

  8. Score distance (SD), hi Leverage = hi+1/I Mahalanobis = (hi)½ hi WSC-6

  9. Orthogonal distance (OD), vi Variance per sample=vi /J Q statistics = vi vi WSC-6

  10. Distribution of distances: the shape? =h/h0x= =v/v0 x~ χ2(N)/N N = DoF E(x) = 1 D(x) = 2/N WSC-6

  11. Example: Leon Rusinov data I=1440 A=6 Nh=5 Nv=1 SD OD WSC-6

  12. = h/h0x= = v/v0 x1,...., xI ~ χ2(N)/N N = ? Method of Moments Interquartile Approach ¼ IQR¼ x(1) ≤ x(2 )≤ .... ≤ x(I-1) ≤ x(I) Distribution of distances: DoF? WSC-6

  13. g=0.2 22 points are out g=0.4 43 points are out g=0.01 1 point is out g=0.05 5 points are out g=0.1 11 points are out Type I error g. I=100 WSC-6

  14. SIM Data. MSPC task I=100 J=25 A=5 g=0.05 WSC-6

  15. SD & OD values WSC-6

  16. Interquartile Approach Method of Moments Nh= 5.7 Nv=21.6 DoF Estimates Nh= 5.0 Nv=20.0 WSC-6

  17. Acceptance areas: conventional I=100g=0.05 WSC-6

  18. Acceptance areas g=0.05: Sum of CHIs I=100g=0.05 WSC-6

  19. Acceptance areas: Ratio of CHIs I=100g=0.05 WSC-6

  20. Wilson-Hilferty approximation for Chi WSC-6

  21. Acceptance areas: Wilson-Hilferty I=100g=0.05 WSC-6

  22. Modified Wilson-Hilferty approximation 1–γ=P0+P1+P2+P3= = Φ(r) – ¼exp(–½r2) r=r(γ) WSC-6

  23. Acceptance areas: modified Wilson-Hilferty I=100g=0.05 WSC-6

  24. Areas Validation: variation of g WSC-6

  25. BMT Data. SIMCA I=45 J=3501 A=2Nh=3 Nv=2 g=0.025 WSC-6

  26. outlier extreme Extremes & Outliers in calibration set Calibration set: I=45 γI = 0.02545 = 1.25 Iout=2 a is significance level for outliers g=1–(1 –a)1/I WSC-6

  27. SIMCA Classification without G07-4 New set: Inew=30 10 Genuine + 20 Fakes γInew= 0.02510 = 0.25 Iout=3 WSC-6

  28. What’s up? This is absolutely wrong classification but Oxana will explain how fix it over. WSC-6

  29. X Y GRAIN Data. Influence plots I=123 J=118 A=4 g=0.01Nh=5.7 Nv=3.0 Nu=1.0 WSC-6

  30. Orthogonal distance to Y WSC-6

  31. Back to WSC-4 WSC-6

  32. Boundary samples (WSC-4) Training set Model 1 Boundary subset l=19 WSC-4

  33. Influence plots for X and Y X Y Calibration Boundary (SIC) WSC-6

  34. I<30 Box or Egg? WSC-6

  35. Conclusion 1 The χ2-distribution can be used in the modeling of the score and orthogonal distances. WSC-6

  36. I>30 Conclusion 2 Any classification problem should be solved with respect to a given type I error. Five of such areas have been presented but only two are recommended. I<30 WSC-6

  37. Conclusion 3 Estimation of DoF is a key challenge in the projection modeling. A data-driven estimator of DoF, rather than a theory-driven one should be used. The method of moments is effective, but sensitive to outliers. The IQR estimator is a robust but less effective alternative. More examples will be demonstrated in the subsequent presentation by Oxana. WSC-6

More Related