310 likes | 338 Views
Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius. The potential of Functional Data Analysis for Chemometrics. The Potential of FDA for Chemometrics. Introduction to FDA Introduction to Chemometrics Using FDA in chemometrics For prediction For Analysis Of Variance
E N D
Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius The potential of Functional Data Analysis for Chemometrics
The Potential of FDA for Chemometrics • Introduction to FDA • Introduction to Chemometrics • Using FDA in chemometrics • For prediction • For Analysis Of Variance • Conclusions
What is Functional Data Analysis? • Developed by Ramsay & Silverman (1997) • Analyse Data • By approximating it • Using some kind of functional basis • Mainly for longitudinal data • High correlation between neighbouring datapoints
Why use FDA? • Data as single entity <-> individual observations • Make a function of your data • Derivatives • Reduce the amount of data • Noise -> smoothing • Impose some known properties on the data • Monotonicity, non-negativeness, smoothness, ...
Basis Functions? • Polynomials: 1, t, t², t³, ... • Fourier: 1, sin(ωt), cos(ωt), sin(2ωt), cos(2ωt) • Splines • Wavelets • Depends on your data
Chemometrics • Measure optical properties of material • Transmission or reflection of light • At a large number of wavelengths • Use these properties to predict something else
Why Chemometrics? • Fast • Cheap • Non-destructive • Environment-friendly
Classical methods • Ignore correlation between neighbouring wavelengths:
FDA in chemometrics • NIR spectra • Absorption peaks • Width and height • Basis: B-splines • ~ shape of absorption peaks • Preserve the vicinity constraint
Spline Functions • Piecewise joining polynomials of order m • Fast evaluation • Continuity of derivatives • Up to order m-2 • In L interior knots • Degrees of freedom: L + m • Flexible
Constructing a spline basis • Order • What to use the model for • Mostly cubic splines (order 4) • Number and position of knots • Use enough • Look at the data • !Overfitting
Position of knots More variation -> more knots
FDA for prediction Functional regression models P-Spline Regression (Marx and Eilers) Non-Parametric Functional Data Analysis (Ferraty and Vieu)
Functional Regression Models Project spectra to spline basis Apply Multivariate Linear Regression to the spline coefficients Great reduction in system complexity Natural shape of absorption peaks is used
Functional Regression Models: case study 420 samples of hog manure Reflectance spectra Total nitrogen (TN) and dry matter (DM) content PLS and Functional Regression applied
P-Spline Regression (PSR) • By Marx and Eilers • Construct with B-splines: • Use roughness parameter on • Minimize • Full spectra are used for regression
P-Spline Regression: case study • 121 samples of seed pills • y is % humidity • PLS: RMSEP = 1,19 • PSR: RMSEP = 1,115 • # B-spline coefficients = 7 • λ= 0.001
Non-Parametric Functional Data Analysis By F. Ferraty and P. Vieu No regression model is involved Prediction by applying local kernel functions in function space So far, no good results yet
FDA in Anova setting: FANOVA • ANOVA: • “Study the relation between a response variable and one or more explanatory variables” • is overall mean • are the effects of belonging to a group g • are residuals
FANOVA: theory • Constraint: • Introduce so that • Introduce functional aspect: • Constraint: introduce
FANOVA: goal and solution • Goal: estimate from • Solution:
FANOVA: significance testing • Locally: • Globally:
FANOVA: case study • Spectra of manure • 4 types of animals: dairy, beef, calf, hog • 3 ambient temperatures: 4°C, 12°C, 20°C • 3 sample temperatures: 4°C, 12°C, 20°C • 9 replicates • => 324 samples • Model:
Conclusions Splines are a good basis for fitting spectral data Using FDA, it is possible to include vicinity constraint in prediction models in chemometrics FANOVA is a good tool to explore the variance in spectral data