550 likes | 886 Views
Definition and overview of chemometrics. Paul Geladi. Head of Research NIR CE Chairperson NIR Nord Unit of Biomass Technology and Chemistry Swedish University of Agricultural Sciences Umeå Technobothnia Vasa paul.geladi @ btk.slu.se paul.geladi @ syh.fi. Project geography.
E N D
Paul Geladi Head of Research NIRCE Chairperson NIR Nord Unit of Biomass Technology and Chemistry Swedish University of Agricultural Sciences Umeå Technobothnia Vasa paul.geladi @ btk.slu.se paul.geladi @ syh.fi
Chemometrics Mathematics Statistics Computer Science In Chemistry
Similar fields • Biometrics ±1900 • Psychometrics ±1930 • Econometrics ±1950 • Technometrics ±1960
Chemometrics • Design of Experiments (DOE) • Exploratory Data Analysis • Classification • Regression and Calibration
Design of Experiments • Most important where possible • Uses: • ANOVA • F-test • t-test • Plots • Response Surfaces
Design of Experiments y = b0 + b1x1 + b2x2 +...+bKxK + b11x12 + b22x22 +...+ bKKxK2 + b12x1x2 +...+ e Factors x1, x2,...xK changed systematically Response y measured and modeled
Exploratory Data Analysis • Design not possible • Sampling situations • Find structure • Find groupings • Find outliers
Classification • Check for groupings = UNSUPERVISED • Existing groupings = SUPERVISED • Visualize groupings • Classify • Test
Regression / Calibration • Two types of variables X / y • Relationship linear / nonlinear • Model • Diagnostics • Residual
y x
Multivariate Data Analysis • Sampled data and design with too many reponses: • Mining • Hospitals • Agriculture • Food industry • More
Nomenclature • Samples are objects • What is measured on the object is a variable
34.92 Spectrum K 1 1 Samples Vectors I
A vector is a collection of numbers. It is always a column vector. 12 3.6 11.1 5.9 34 0.5 1.4 17
12 3.6 11.1 5.9 34 0.5 1.4 17 The transpose of a vector is a row vector. Symbols for transpose are ’ and T. a’ or aT.
The Data Matrix K A data matrix is a vector of vectors I
Size histograms, all samples Particle area
Times in batch reaction NIR wavelengths
Problem I and K can be large Correlation Univariate statistics does not apply
3 variables: blood oxygen, iron, hemoglobin I patients
Hb Fe O2
Hb Fe O2
Hb Fe O2
Hb Fe O2
Hb Fe O2
Hb Fe O2
Hb Fe O2
Hb Fe O2
Hb Fe O2
Properties of multivariate space Rotation vectors unchanged / distance unchanged Translation vectors changed / distance unchanged Rescaling / change units all changes
Consequences • We can move the coordinate sytem around • The relative distances between objects do not change • We can rotate the coordinate system • Scale changes are important • Move coordinate system to center of data • Scale properly
Vectors (physics) x = [ x1, x2, x3 ] || x || = ( x12 + x22 + x32 ) 1/2
Geometry c2 = a2 + b2 c a b
Vectors (K dimensions) x = [ x1, x2,..., xK ] || x || = ( x12 + x22 +...+ xK2 ) 1/2
Problem We can not see in more than 3 dimensions Paper, computer screen: 2-2.5 dimensions
Hb Fe O2
Hb Fe O2
Projection 2D plane (screen, paper) Many projections possible Find a good one Find a few good ones What is good?