380 likes | 501 Views
Critical levels in projection. Alexey Pomerantsev Semenov Institute of Chemical Physics, Moscow. Projection approach. OD : distance to the model. SD : distance within the model. Scores & Orthogonal Distances. PLS/PCR Influence plot. SIMCA Classification. MSPC. Where applied.
E N D
Critical levels in projection Alexey PomerantsevSemenov Institute of Chemical Physics, Moscow WSC-6
Projection approach WSC-6
OD: distance to the model SD: distance within the model Scores & Orthogonal Distances WSC-6
PLS/PCR Influence plot SIMCA Classification MSPC Where applied WSC-6
Giants battle at ICS-L, April 2007 The ratios of residual variances of PCA are fairlywell F-distributed. This is easy - the shape of the distribution of a ratio of two variances usually looks like an F. Svante Wold No, the residuals from PCA don't follow an F-distribution unless you fuss with the degrees of freedom, and there are better alternatives in any case. Barry Wise WSC-6
Pt K = I × I J X J K T Full PCA Decomposition K=rank(X) ≤ min (I, J) X=TPtL=TtT=diag(l1,.., lK) WSC-6
A t I EA J + I = I × PA J J A X TA A≤ K Truncated PCA Decomposition WSC-6
Score distance (SD), hi Leverage = hi+1/I Mahalanobis = (hi)½ hi WSC-6
Orthogonal distance (OD), vi Variance per sample=vi /J Q statistics = vi vi WSC-6
Distribution of distances: the shape? =h/h0x= =v/v0 x~ χ2(N)/N N = DoF E(x) = 1 D(x) = 2/N WSC-6
Example: Leon Rusinov data I=1440 A=6 Nh=5 Nv=1 SD OD WSC-6
= h/h0x= = v/v0 x1,...., xI ~ χ2(N)/N N = ? Method of Moments Interquartile Approach ¼ IQR¼ x(1) ≤ x(2 )≤ .... ≤ x(I-1) ≤ x(I) Distribution of distances: DoF? WSC-6
g=0.2 22 points are out g=0.4 43 points are out g=0.01 1 point is out g=0.05 5 points are out g=0.1 11 points are out Type I error g. I=100 WSC-6
SIM Data. MSPC task I=100 J=25 A=5 g=0.05 WSC-6
SD & OD values WSC-6
Interquartile Approach Method of Moments Nh= 5.7 Nv=21.6 DoF Estimates Nh= 5.0 Nv=20.0 WSC-6
Acceptance areas: conventional I=100g=0.05 WSC-6
Acceptance areas g=0.05: Sum of CHIs I=100g=0.05 WSC-6
Acceptance areas: Ratio of CHIs I=100g=0.05 WSC-6
Acceptance areas: Wilson-Hilferty I=100g=0.05 WSC-6
Modified Wilson-Hilferty approximation 1–γ=P0+P1+P2+P3= = Φ(r) – ¼exp(–½r2) r=r(γ) WSC-6
Acceptance areas: modified Wilson-Hilferty I=100g=0.05 WSC-6
BMT Data. SIMCA I=45 J=3501 A=2Nh=3 Nv=2 g=0.025 WSC-6
outlier extreme Extremes & Outliers in calibration set Calibration set: I=45 γI = 0.02545 = 1.25 Iout=2 a is significance level for outliers g=1–(1 –a)1/I WSC-6
SIMCA Classification without G07-4 New set: Inew=30 10 Genuine + 20 Fakes γInew= 0.02510 = 0.25 Iout=3 WSC-6
What’s up? This is absolutely wrong classification but Oxana will explain how fix it over. WSC-6
X Y GRAIN Data. Influence plots I=123 J=118 A=4 g=0.01Nh=5.7 Nv=3.0 Nu=1.0 WSC-6
Orthogonal distance to Y WSC-6
Back to WSC-4 WSC-6
Boundary samples (WSC-4) Training set Model 1 Boundary subset l=19 WSC-4
Influence plots for X and Y X Y Calibration Boundary (SIC) WSC-6
I<30 Box or Egg? WSC-6
Conclusion 1 The χ2-distribution can be used in the modeling of the score and orthogonal distances. WSC-6
I>30 Conclusion 2 Any classification problem should be solved with respect to a given type I error. Five of such areas have been presented but only two are recommended. I<30 WSC-6
Conclusion 3 Estimation of DoF is a key challenge in the projection modeling. A data-driven estimator of DoF, rather than a theory-driven one should be used. The method of moments is effective, but sensitive to outliers. The IQR estimator is a robust but less effective alternative. More examples will be demonstrated in the subsequent presentation by Oxana. WSC-6