170 likes | 332 Views
5. Multiway calibration. Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP. Multiway regression problems e.g. batch reaction monitoring. Process measurements. Product quality. Y. X. batch. batch. time. product quality. process variable.
E N D
5. Multiway calibration Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP
Multiway regression problems e.g. batch reaction monitoring Process measurements Product quality Y X batch batch time product quality process variable
Multiway regression problems e.g. tandem mass spectrscopy MS-MS spectra Compound concentrations X1 X2 X3 X4 X5 sample daughter ion m/z samples compound parent ion m/z
Some terminology Cannot handle interferents Univariate calibration (OLS – ordinary least squares) zero-order Can handle interferents if they are present in the training set Multivariate calibration (ridge regression, PCR, PLS etc.) first-order N-PLS(?) Can handle unknown interferents (although see work of K.Faber) Second-order advantage (PARAFAC, restricted Tucker, GRAM, RBL etc.) second-order
Multiway calibration methods • PARAFAC (already discussed on first day) • (Unfold-PLS) • Multiway PCR • N-PLS • MCovR (multiway covariates regression) (see work of Smilde & Gurden) • GRAM, NBRA, RBL (see work of Kowalski et al.)
Unfold-PLS • Matricize (or ‘unfold’) the data and use standard two-way PLS: X X1 ... XI Y I I I K JK J M • But if a multiway structure exists in the data, multiway methods have some important advantages!!
Standard PCR for X (IJ) and y (I 1). PT • Calculate PCA model of X: • X = TPT + E • Use PCA scores for ordinary regression: • y = Tb + E • b = (TTT)-1TTy • Calculate PCA model of X: • X = TPT + E = + X E T b Y Two-way PCR • Calculate PCA model of X: • X = TPT + E • Use PCA scores for ordinary regression: • y = Tb + E • b = (TTT)-1TTy • Make predictions for new samples: • Tnew = XnewP • ynew = Tnew b
Multiway PCR for X (IJ K) and y (I 1). CT • Calculate multiway model: • X = A(C||B)T + E • Use scores for regression: • y = A bPCR + E • bPCR = (ATA)-1ATy • Calculate multiway model: • X = A(C||B)T + E BT = + X E A bPCR Y Multiway PCR • Calculate multiway model: • X = A(C||B)T + E • Use scores for regression: • y = A bPCR + E • bPCR = (ATA)-1ATy • Make predictions for new samples: • Anew = XnewP(PTP)-1 • where P = (C||B) • ynew= Anew bPCR
N-PLS • N-PLS is a direct extension of standard two-way PLS for N-way arrays. • The advantages of N-PLS are the same as for any multiway analysis: • a more parsimonious model • loadings which are easier to plot and interpret
The standard two-way PLS algorithm (see ‘Multivariate Calibration’ by Martens and Næs): The N-PLS algorithm (R.Bro) uses PARAFAC-type loadings, but is otherwise very similar 1. 1. 2. 2. 3. 3. 4. 4. N-PLS
Restricted Tucker, GRAM, RBL, NBRA etc. • for more specialized use • second-order advantage, i.e. able to handle unknown interferents N M standard, N 1 0 restricted loadings, A mixture, N + M Other methods • Multiway covariates regression (MCovR) • different to PLS-type models • choice of structure on X (PARAFAC, Tucker, unfold etc.) • sometimes loadings are easier to interpret
Conclusions • There are a number of different calibration methods for multiway data. • N-PLS is a extension of two-way PLS for multiway data. • All the normal guidelines for multivariate regression still apply!! • watch out for outliers • don’t apply the model outside of the calibration range
Remove outlier Outliers (1) • Outliers are objects which are very different from the rest of the data. These can have a large effect on the regression model and should be removed. bad experiment
6 4 2 Scores PC 2 0 -2 -4 -6 -8 Scores PC 1 -8 -6 -4 -2 0 2 4 6 8 Outliers (2) • Outliers can also be found in the model space or in the residuals.
Model extrapolation... • Univariate example: mean height vs age of a group of young children • A strong linear relationship between height and age is seen. • For young children, height and age are correlated. Moore, D.S. and McCabe G.P., Introduction to the Practice of Statistics (1989).
...but is not valid for 30 year olds! Linear model was valid for this age range... ... can be dangerous!