680 likes | 687 Views
This text provides an overview of regression and calibration methods including Multiple Linear Regression (MLR), Ridge Regression (RR), Principal Component Regression (PCR), and Partial Least Squares (PLS). It discusses their advantages, limitations, and prediction diagnostics.
E N D
Regression / Calibration MLR, RR, PCR, PLS
Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural Sciences Umeå Technobothnia Vasa paul.geladi@btk.slu.sepaul.geladi@syh.fi
y Slope a Offset x
y e y = a + bx + e Slope b a Offset a x
y x
Linear fit Underfit y x
y Overfit x
y Quadratic fit x
y = f(x) Works sometimes y = f(x) Works only for a few variables Measurement noise! ∞ possible functions
K X y I
y = f(x) y = f(x) Simplified by: y = b0 + b1x1 + b2x2 + ... + bKxK + f Linear approximation
Nomenclature y = b0 + b1x1 + b2x2 + ... + bKxK + f y : response xk : predictors bk : regression coefficients b0 : offset, constant f : residual
K X y I X, y mean-centered b0 out
y = b1x1 + b2x2 + ... + bKxK + f y = b1x1 + b2x2 + ... + bKxK + f y = b1x1 + b2x2 + ... + bKxK + f } I samples y = b1x1 + b2x2 + ... + bKxK + f y = b1x1 + b2x2 + ... + bKxK + f
y = b1x1 + b2x2 + ... + bKxK +f y = b1x1 + b2x2 + ... + bKxK +f y = b1x1 + b2x2 + ... + bKxK +f y = b1x1 + b2x2 + ... + bKxK +f y = b1x1 + b2x2 + ... + bKxK +f
K y X f + = b I y=Xb+f
X, yknown, measurableb,funknownNo solutionfmust be constrained
The MLR solutionMultiple Linear RegressionOrdinary Least Squares (OLS)
b= (X’X)-1X’y Least squares Problems?
3b1 + 4b2 = 1 4b1 + 5b2 = 0 One solution
3b1 + 4b2 = 1 4b1 + 5b2 = 0 b1 + b2 = 4 No solution
3b1 + 4b2 + b3 = 1 4b1 + 5b2 +b3 = 0 ∞ solutions
b= (X’X)-1X’y -K > I ∞ solutions -I > K no solution -error in X -error in y -inverse may not exist -inverse may be unstable
3b1 + 4b2 + e = 1 4b1 + 5b2 + e = 0 b1 + b2 + e = 4 Solution
Wanted solution • - I ≥ K • No inverse • No noise in X
Diagnostics y=Xb+f SS tot = SSmod + SSres R2 = SSmod / SStot = 1- SSres / SStot Coefficient of determination
Diagnostics y=Xb+f SSres = f’f RMSEC = [ SSres / (I-A) ] 1/2 Root Mean Squared Error of Calibration
Ridge Regression (RR) b= (X’X)-1X’y I easiest to invert b= (X’X + kI)-1X’y k (ridge constant) as small as possible
Problems - Choice of ridge constant - No diagnostics
Principal Component Regression (PCR) • I ≥ K • Easy inversion
Principal Component Regression (PCR) A K X T PCA • - A ≤ I • T orthogonal • Noise in X removed
Principal Component Regression (PCR) y=Td+f d = (T’T)-1T’y
Problem How many components used?
Advantage - PCA done on data - Outliers - Classes - Noise in X removed
X t u Y
X t u Y w’ q’ Outer relationship
X t u Y w’ q’ Inner relationship
A A X t u Y w’ q’ A A p’
Advantages - X decomposed - Y decomposed - Noise in X left out - Noise in Y left out
PCR, PLS are one component at a time methodsAfter each component, a residual is calculatedThe next component is calculatedon the residual
Another view y=Xb+f y=XbRR+fRR y=XbPCR+fPCR y=XbPLS+fPLS
K Xcal ycal I Xtest yhat ytest J
Prediction diagnostics yhat = Xtestb ftest = ytest -yhat PRESS = ftest’ftest RMSEP = [ PRESS / J ] 1/2 Root Mean Squared Error of Prediction
Prediction diagnostics yhat = Xtestb ftest = ytest -yhat R2test = Q2 = 1 - ftest’ftest/ytest’ytest