1 / 14

From linearity to nonlinear additive spline modeling in Partial Least-Squares regression

Scuola della Società Italiana di Statistica, Capua 2004/09/15. From linearity to nonlinear additive spline modeling in Partial Least-Squares regression. Jean-François Durand Montpellier II University. Main effects Linear Partial Least-Squares (PLSL). p predictors (cont. or categorical).

ogden
Download Presentation

From linearity to nonlinear additive spline modeling in Partial Least-Squares regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scuola della Società Italiana di Statistica, Capua 2004/09/15 From linearity to nonlinear additive spline modeling in Partial Least-Squares regression Jean-François Durand Montpellier II University

  2. Main effects Linear Partial Least-Squares (PLSL) • p predictors (cont. or categorical) • Learning data matrices : X nxp, r=rank(X), and Y nxq • q responses (cont. or categorical) • continuous : regression model • q indicator var ’s : classification model All variables are standardized with respect to

  3. k latent variables • algorithm • (1) • (2) Once obtained, « Partial » regressions are made • and next is computed on remaining information

  4. OLS model on the k latent variables • « coordinate » linear function of • the main effect of on the • response . • To summarize : PLSL (X ,Y)

  5. The dimension of the model : k • Cross-Validation (CV or GCV) • if k=r, PLSL( X , Y) = OLS(X , Y) • If Y = X , • PLSL( X , Y=X ) = PCA( X ) • Pruning step : Variable subset selection (CV or GCV)

  6. Maps of the observations

  7. Main effects Partial Least-Squares Splines (PLSS) • Additive model through k latent variables • « coordinate » spline function of • the main effect of on the • response : a spline function • To summarize : • PLSS(X ,Y)= PLSL(B ,Y) • B = spline coding matrix of X

  8. Pruning step : parsimonious models by selecting main effects according to the range of spline functions. Validation of the new models : CV or GCV • principal components maps

  9. The PLS dimension : k (CV or GCV) The spline space for each predictor the degree d the « knots» : the number K and the locations Dimension of the spline space : d+1+K Advantages of PLSS against colinearity of predictors against small ratio #observations / #predictors easy to interpret the main effects spline functions tuning parameters

  10. Multivariate Additive PLS Splines : MAPLSS (bivariate interactions) • Model casted in the ANOVA decomposition : • ANOVA • spline • functions

  11. The curse of dimensionality The price of nonlinearity : expansion of the dimension of B MAPLSS(X,Y) = PLSL(B,Y) B = spline coding matrix of X with interactions Example : p predictors  (p -1)p / 2 possible interactions spline dimension = 10 for each predictor Necessity of eliminating non influent interactions

  12. 1) Automatic selection of candidate interactions : • Denote • or • each interaction i is separately added to the main effects model mand evaluated • Rule: Order decreasingly interactions, refuse one if CRIT(k)<0 • 2) Add step-by-step ordered candidates to the main effects model, and accept a model if it significantly improves CV

  13. 3) Pruning step : Selection of main effects and interactions according to the range of the ANOVA functions (CV/GCV) • Advantages of MAPLSS : • inherits the advantages of PLSL and PLSS • captures most influential bivariate interations • easy interpretable ANOVA function plots • Disadvantages of MAPLSS : • no higher interactions • no automatic selection of spline parameters

  14. Bibliography • J. F. Durand. Local Polynomial Additive Regression through PLS and Splines: PLSS, Chemometrics and Intelligent Laboratory Systems 58, 235-246, 2001. • J. F. Durand and R. Lombardo. Interactions terms in nonlinear PLS via additive spline transformations. « Between Data Science and Applied Data Analysis », Studies in Classification, Data Analysis, and Knowledge Organization . Eds M.Schader, W. Gaul and M. Vichi, Springer, 22-29, 2003

More Related