1 / 29

Choosing the Right Calibration Model – A DoE Approach to an Old Problem

Choosing the Right Calibration Model – A DoE Approach to an Old Problem. Geir Rune Flåten and Anthony D. Walmsley. Contents. The Old Calibration Problem Design of Experiments Rational for the approach Some examples Process Raman Data Visible Spectra of Metal Ions Conclusions.

abby
Download Presentation

Choosing the Right Calibration Model – A DoE Approach to an Old Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Choosing the Right Calibration Model – A DoE Approach to an Old Problem Geir Rune Flåten and Anthony D. Walmsley

  2. Contents • The Old Calibration Problem • Design of Experiments • Rational for the approach • Some examples Process Raman Data Visible Spectra of Metal Ions • Conclusions

  3. The Old Calibration Problem • The general form of the calibration model is: • Where bcontains the regression coefficients describing the relationship among the response values, and X the measurements. In a typical spectroscopic application, y is a vector containing the quality measurements, e.g. concentration, X contains the spectra for each sample, and E is the residual matrix.

  4. Calibration • The calibration problem seems simple, but there are a huge number of different calibration techniques, all of them aim at finding the regression vector b, which minimises the residual matrix E. • Solving the equation directly is equivalent to least squares regression (LS)

  5. Calibration • For spectroscopic data, partial least squares regression (PLSR) and principal component regression (PCR) are probably the most widely used calibration techniques. These are both based on a dimension reduction of X before calculating the regression vector b.

  6. Calibration • The number of components to include in the model is a critical parameter for dimensional reduction methods. • In theory, the number of components can be any number between zero and the number of samples, or the number of wave-numbers if that is the smallest dimension in. • There is also a great range of pre-treatment procedures advocated in the literature

  7. Calibration • The number of possible calibration models is very high. The challenge for the analyst is to choose the best one, or more precisely, the most useful one. • The criteria for a good model are to some extent, problem-dependent, but a good description of the calibration samples and the ability to predict the quality measurement well for new samples are usually required • Good fitting abilities and good prediction abilities are partly contradictory. • In the extreme case, a model with perfect fit to the calibration samples is not able to predict any samples different from the calibration samples. • The traditional statistical approach to choosing calibration models overemphasizes the fitting aspect. • In chemical applications it is necessary to balance the two aspects as the predictive ability is often far more important than the model describing the calibration samples

  8. Design of Experiments • The problem of choosing the right calibration model for a given dataset can be seen as an optimisation problem with many input variables and one or more response variables. • Such a problem is well suited for an experimental design approach. • Use experimental design can for choosing the right calibration model. • The start point is to identify variables (or factors) relevant for the data at hand, and also determine on which levels these factors are to be varied at. • The factors can be different kinds of pre-treatment, type of regression method, number of components etc

  9. Pre-treatment • There are two main reasons for pre-treating data • Filtering noise and irrelevant features from the data • Remove irregularities such as [weak] non-linearity etc.

  10. Pre-treatment • Scaling • Mean centring of the data is more or less standard in chemometric analyses, and especially for the rank reduction methods • Often allows the analyst to use one less component in the model. • Performed by subtracting the mean of each column from all the elements in the respective columns in the data matrix. • In auto scaling the data are mean centred and additionally all the elements in each column are divided by the standard deviation of the column. This gives all the variables unit variance which is useful when the variables have different units. • Auto scaling is rarely used for spectroscopic data, and it is therefore not included in the experimental designs presented herein

  11. Pre-treatment • Derivation • Derivation is used to remove baseline effects from the data. The first derivative can be used, but also the second derivative is occasionally found useful: • Z is the matrix of the second derivatives • X is the raw data matrix • indicates the wave numbers.

  12. Box-Cox transformations • The series of Box-Cox transformations were proposed as a remedy for non-linearity in the dataTraditionally in chemometrics, the Box-Cox transformation is often simplified and named root transformations.

  13. Pre-treatment • Orthogonal Signal Correction (OSC) • OSC is a fairly new pre-treatment method suggested by Wold’s research group. The original idea was to remove information in the data matrix irrelevant to the variation in the response vector

  14. The experimental design approach The general idea of experimental design is to vary the levels of the input factors systematically at a range of pre-defined levels in order to find their effects on the output variable(s) for a given process. Simultaneously, possible interaction effects among the input variables can be found. After choosing the appropriate kind of experimental design, the first step in the experimental design approach is to choose which variables to include in the study, and to decide on which levels they should be varied. Next, the effects that the input variables have on the output variable(s) are calculated. The results are studied, and, if necessary, the design is refined and new experiments are run.

  15. Rational for the approach

  16. Process Raman Data • The data set consists of 89 samples with 1925 variables of Raman spectra in the Raman Shift range (400 – 2000 cm -1) characterising the naphtha feed into a distillation column. • Some of the samples are spiked, some are not spiked, and some exhibit Laser Induced Fluorescence. • The reference values are obtained by gas chromatography. Originally, 17 compounds were measured, but in this work the results from only one of them are referred

  17. Visible Spectra of Metal Ions • The data were sampled at Hull University as part of a metrological ring study for determining the precision of chemometric methods. • The data contain VIS-spectra with 1301 variables in the wavelength range 350 to 1000 nm, describing a series of standards containing various amounts of the four metal ions, Co2+, Cr3+, Cu2+, and Ni2+ in nitric acid. • The solutions are made from the aqueous standards

  18. Visible Spectra of Metal Ions • The concentrations are in the region 1000 to 4000 ppm, and the concentration levels are varied according to a full factorial design. • The design was augmented with ten validation experiments spanning the same concentration area and including at least one sample with zero concentration for each of the components. • Show only results from analysing Cr3+. • The concentration levels for Cr3+ in the design are 1000 and 1400 ppm, and this is also the concentration range for the validation samples apart from the one with zero concentration.

  19. The experimental design • The experimental design used is a mixed level design with six variables: • In other words, there are two variables varied at two levels, three variables varied at three levels and one variable varied at eight levels.

  20. The Experimental Design • First factor is regression method • varied between PLS and PCR. • Second factor is Scaling • No scaling for the scaling variable means that the data is not mean centred as opposed to the high level, in which the data are mean centred. • Third factor Box-Cox transformation • The Box-Cox transformation variable has three levels: At the first level the data are left unchanged, at the second level an element-wise square root transformation is performed, and at the third level the scalar one is added to all elements in the data matrix before taking the element wise logarithm with base 10. • Fourth factor OSC • At the first level of the OSC variable no OSC pre-treatment is performed. At the two next levels one and two OSC components are extracted, respectively. • Fifth factor • The first and second order derivation is performed using the Savitzky-Golay procedure. • Sixth factor • The number of components was varied from one to eight.

  21. Process Raman Data

  22. The effects of the different variables on the RMSEC values for the metal data. The plots along the main diagonal of the plot matrix show the main effects, and the other plots show interaction effects between pairs of variables. The levels of the variables indicated by the column labels correspond to the different lines in the plots, while the levels for the row labels are found along the abscissas in the plots.

  23. The effects of the different variables on the RMSEP values for the metal data

  24. The effects of the different variables on the RMSEC values for the naphtha data

  25. The effects of the different variables on the RMSEP values for the naphtha data

  26. The effects of the different variables on the PRESS values for the naphtha data

  27. Conclusions • Experimental design can be used to choose the parameter settings for the best calibration model. • Both the aspect of pre-treatment and number of components are incorporated. • The plot matrices introduced give the analyst a visual tool which reveals the effect of all considered variables. Using this tool the analyst can quickly and easily choose a limited number of models to compare. • Additionally, the plot matrices can be a pedagogical tool feasible in communication with less experienced staff. • Of course, the experience of a skilled chemometrician cannot be replaced, but experimental design can give an objective overview of the effects of the different variables. .

  28. Conclusions • Experimental design gives a unified approach to calibration which is intuitive, easy to employ and know-how accumulating in the sense that the experimental designs can be kept, reused and extended if new knowledge emerges. • The same experimental design can be used for very different data set as shown in this paper. • Analysts working with calibration problems are likely to know about experimental design, and most likely experimental design can be found in the software package used to build the calibration model.

  29. Acknowledgements • Geir Rune Flaten • Vicki Loades • Selena Richards • Ruth Wellock • Zaid Rawi

More Related