720 likes | 901 Views
Regression / Calibration. MLR, RR, PCR, PLS. Paul Geladi. Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural Sciences Umeå Technobothnia Vasa paul.geladi@btk.slu.se paul.geladi@syh.fi. Univariate regression. y. Slope. a. Offset.
E N D
Regression / Calibration MLR, RR, PCR, PLS
Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural Sciences Umeå Technobothnia Vasa paul.geladi@btk.slu.sepaul.geladi@syh.fi
y Slope a Offset x
y e y = a + bx + e Slope b a Offset a x
y x
Linear fit Underfit y x
y Overfit x
y Quadratic fit x
y = f(x) Works sometimes y = f(x) Works only for a few variables Measurement noise! ∞ possible functions
K X y I
y = f(x) y = f(x) Simplified by: y = b0 + b1x1 + b2x2 + ... + bKxK + f Linear approximation
Nomenclature y = b0 + b1x1 + b2x2 + ... + bKxK + f y : response xk : predictors bk : regression coefficients b0 : offset, constant f : residual
K X y I X, y mean-centered b0 out
y = b1x1 + b2x2 + ... + bKxK + f y = b1x1 + b2x2 + ... + bKxK + f y = b1x1 + b2x2 + ... + bKxK + f } I samples y = b1x1 + b2x2 + ... + bKxK + f y = b1x1 + b2x2 + ... + bKxK + f
y = b1x1 + b2x2 + ... + bKxK +f y = b1x1 + b2x2 + ... + bKxK +f y = b1x1 + b2x2 + ... + bKxK +f y = b1x1 + b2x2 + ... + bKxK +f y = b1x1 + b2x2 + ... + bKxK +f
K y X f + = b I y=Xb+f
X, yknown, measurableb,funknownNo solutionfmust be constrained
The MLR solutionMultiple Linear RegressionOrdinary Least Squares (OLS)
b= (X’X)-1X’y Least squares Problems?
3b1 + 4b2 = 1 4b1 + 5b2 = 0 One solution
3b1 + 4b2 = 1 4b1 + 5b2 = 0 b1 + b2 = 4 No solution
3b1 + 4b2 + b3 = 1 4b1 + 5b2 +b3 = 0 ∞ solutions
b= (X’X)-1X’y -K > I ∞ solutions -I > K no solution -error in X -error in y -inverse may not exist -inverse may be unstable
3b1 + 4b2 + e = 1 4b1 + 5b2 + e = 0 b1 + b2 + e = 4 Solution
Wanted solution • - I ≥ K • No inverse • No noise in X
Diagnostics y=Xb+f SS tot = SSmod + SSres R2 = SSmod / SStot = 1- SSres / SStot Coefficient of determination
Diagnostics y=Xb+f SSres = f’f RMSEC = [ SSres / (I-A) ] 1/2 Root Mean Squared Error of Calibration
Ridge Regression (RR) b= (X’X)-1X’y I easiest to invert b= (X’X + kI)-1X’y k (ridge constant) as small as possible
Problems - Choice of ridge constant - No diagnostics
Principal Component Regression (PCR) • I ≥ K • Easy inversion
Principal Component Regression (PCR) A K X T PCA • - A ≤ I • T orthogonal • Noise in X removed
Principal Component Regression (PCR) y=Td+f d = (T’T)-1T’y
Problem How many components used?
Advantage - PCA done on data - Outliers - Classes - Noise in X removed
X t u Y
X t u Y w’ q’ Outer relationship
X t u Y w’ q’ Inner relationship
A A X t u Y w’ q’ A A p’
Advantages - X decomposed - Y decomposed - Noise in X left out - Noise in Y left out
PCR, PLS are one component at a time methodsAfter each component, a residual is calculatedThe next component is calculatedon the residual
Another view y=Xb+f y=XbRR+fRR y=XbPCR+fPCR y=XbPLS+fPLS
K Xcal ycal I Xtest yhat ytest J
Prediction diagnostics yhat = Xtestb ftest = ytest -yhat PRESS = ftest’ftest RMSEP = [ PRESS / J ] 1/2 Root Mean Squared Error of Prediction
Prediction diagnostics yhat = Xtestb ftest = ytest -yhat R2test = Q2 = 1 - ftest’ftest/ytest’ytest