850 likes | 1.2k Views
Kernel Density Estimation. Theory and Application in Discriminant Analysis. Thomas Ledl Universität Wien. Contents:. Introduction Theory Aspects of Application Simulation Study Summary. Introduction. 0. 1. 2. 3. 4. Introduction. Theory. Application Aspects. Simulation Study.
E N D
Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien
Contents: • Introduction • Theory • Aspects of Application • Simulation Study • Summary
0 1 2 3 4 Introduction Theory Application Aspects Simulation Study Summary 25 observations: Which distribution? Introduction
0 1 2 3 4 ? ? ? ? ?
0 1 2 3 4 Introduction Theory K(.) and h to choose Application Aspects Simulation Study Summary Kernel density estimator model:
0 1 2 3 4 kernel/ bandwidth: „large“ h „small“ h triangular gaussian
Introduction Theory Application Aspects Simulation Study Summary Question 1: Which choice of K(.) and h is the best for a descriptive purpose?
Introduction Theory Application Aspects Simulation Study Summary Classification: Introduction
Introduction Theory Application Aspects Simulation Study Summary Classification: Levelplot – LDA (based on assumption of a multivariate normal distribution): Introduction
Introduction Theory Application Aspects Simulation Study Summary Classification: Introduction
Introduction Theory Application Aspects Simulation Study Summary Classification: Levelplot – KDE classificator: Introduction
Introduction Theory Application Aspects Simulation Study Summary Question 2: Performance of classification based on KDE in more than 2 dimensions? Introduction
Introduction Theory Application Aspects Simulation Study Summary Essential issues • Optimization criteria • Improvements of the standard model • Resulting optimal choices of the model parameters K(.) and h
Introduction Theory Application Aspects Simulation Study Summary Essential issues • Optimization criteria • Improvements of the standard model • Resulting optimal choices of the model parameters K(.) and h
Introduction Theory Application Aspects Simulation Study Summary Optimization criteria Lp-distances:
Introduction Theory Application Aspects Simulation Study Summary f(.) g(.)
Introduction Theory Application Aspects Simulation Study Summary
=IAE „Integrated absolute error“ =ISE „Integrated squared error“ Introduction Theory Application Aspects Simulation Study Summary
=IAE „Integrated absolute error“ =ISE „Integrated squared error“ Introduction Theory Application Aspects Simulation Study Summary
Minimization of the maximum vertical distance Introduction Theory Application Aspects Simulation Study Summary Other ideas: • Consideration of horizontal distances for a more intuitive fit (Marron and Tsybakov, 1995) • Compare the number and position of modes
L1-distance=IAE L-distance=Maximum difference „Modern“ criteria, which include a kind of measure of the horizontal distances L2-distance=ISE, MISE,AMISE,... Difficult mathematical tractability Does not consider overall fit Difficult mathematical tractability Introduction Theory Application Aspects Simulation Study Summary Overview about some minimization criteria • Most commonlyused
ISE is a random variable MISE=E(ISE), the expectation of ISE AMISE=Taylor approximation of MISE, easier to calculate Introduction Theory Application Aspects Simulation Study Summary ISE, MISE, AMISE,...
Introduction Theory Application Aspects Simulation Study Summary Essential issues • Optimization criteria • Improvements of the standard model • Resulting optimal choices of the model parameters K(.) and h
Introduction Theory Application Aspects Simulation Study Summary The AMISE-optimal bandwidth
dependent on the kernel function K(.) Introduction minimized by Theory „Epanechnikov kernel“ Application Aspects Simulation Study Summary The AMISE-optimal bandwidth
dependent on the unknown density f(.) Introduction Theory Application Aspects Simulation Study Summary The AMISE-optimal bandwidth How to proceed?
Maximum Likelihood Cross-Validation Least-squares cross-validation (Bowman, 1984) Leave-one-out selectors Criteria based on substituting R(f“) in the AMISE-formula Introduction Theory Application Aspects Simulation Study Summary Data-driven bandwidth selection methods • „Normal rule“ („Rule of thumb“; Silverman, 1986) • Plug-in methods (Sheather and Jones, 1991; Park and Marron,1990) • Smoothed bootstrap
Introduction Theory Application Aspects Simulation Study Summary Data-driven bandwidth selection methods Leave-one-out selectors • Maximum Likelihood Cross-Validation • Least-squares cross-validation (Bowman, 1984) Criteria based on substituting R(f“) in the AMISE-formula • „Normal rule“ („Rule of thumb“; Silverman, 1986) • Plug-in methods (Sheather and Jones, 1991; Park and Marron,1990) • Smoothed bootstrap
Introduction Theory Application Aspects Simulation Study Summary Least squares cross-validation (LSCV) • Undisputed selector in the 1980s • Gives an unbiased estimator for the ISE • Suffers from more than one local minimizer – no agreement about which one to use • Bad convergence rate for the resulting bandwidth hopt
Maximum Likelihood Cross-Validation Least-squares cross-validation (Bowman, 1984) Introduction Theory Application Aspects Simulation Study Summary Data-driven bandwidth selection methods Leave-one-out selectors Criteria based on substituting R(f“) in the AMISE-formula • „Normal rule“ („Rule of thumb“; Silverman, 1986) • Plug-in methods (Sheather and Jones, 1991; Park and Marron,1990) • Smoothed bootstrap
The resulting bandwidth is given by: Introduction Theory Application Aspects Simulation Study Summary Normal rule („Rule of thumb“) • Assumes f(x) to be N(,2) • Easiest selector • Often oversmooths the function
Maximum Likelihood Cross-Validation Least-squares cross-validation (Bowman, 1984) Introduction Theory Application Aspects Simulation Study Summary Data-driven bandwidth selection methods Leave-one-out selectors Criteria based on substituting R(f“) in the AMISE-formula • „Normal rule“ („Rule of thumb“; Silverman, 1986) • Plug-in methods (Sheather and Jones, 1991; Park and Marron,1990) • Smoothed bootstrap
Introduction Theory Application Aspects Simulation Study Summary Plug in-methods (Sheather and Jones, 1991; Park and Marron,1990) • Does not substitute R(f“) in the AMISE-formula, but estimates it via R(f(IV)) and R(f(IV)) via R(f(VI)),etc. • Another parameter i to chose (the number of stages to go back) – one stage is mostly sufficient • Better rates of convergence • Does not finally circumvent the problem of the unknown density, either
Introduction Theory Application Aspects Simulation Study Summary The multivariate case h H...the bandwidth matrix
Introduction Theory Application Aspects Simulation Study Summary Issues of generalization in d dimensions • d2 instead of one bandwidth parameter • Unstable estimates • Bandwidth selectors are essentially straightforward to generalize • For Plug-in methods it is „too difficult“ to give succint expressions for d>2 dimensions
Introduction Theory Application Aspects Simulation Study Summary Essential issues • Curse of dimensionality • Connection between goodness-of-fit and optimal classification • Two methods for discrimatory purposes
Introduction Theory Application Aspects Simulation Study Summary Essential issues • Curse of dimensionality • Connection between goodness-of-fit and optimal classification • Two methods for discrimatory purposes
Introduction Theory Application Aspects Simulation Study d :a good fit in the tails is desired! Summary The „curse of dimensionality“ The data „disappears“ into the distribution tails in high dimensions
Introduction Theory Application Aspects Simulation Study Summary The „curse of dimensionality“ Much data is necessary to obey a constant estimation error in high dimensions
Introduction Theory Application Aspects Simulation Study Summary Essential issues • Curse of dimensionality • Connection between goodness-of-fit and optimal classification • Two methods for discrimatory purposes
Optimal classification (in high dimensions) AMISE-optimal parameter choice • L2-optimal • L1-optimal (Misclassification rate) • Worse fit in the tails • Estimation of tails important • Calculation intensive for large n • Many observations required for a reasonable fit Essential issues
Introduction Theory Application Aspects Simulation Study Summary Essential issues • Curse of dimensionality • Connection between goodness-of-fit and optimal classification • Two methods for discrimatory purposes
Introduction Theory Application Aspects Simulation Study Summary Method 1: • Reduction of the data onto a subspace which allows a somewhat accurate estimation, however does not destoy too much information „trade-off“ • Use the multivariate kernel density concept to estimate the class densities
Introduction Theory Application Aspects Simulation Study Summary Method 2: • Use the univariate concept to „normalize“ the data nonparametrically • Use the classical methods like LDA and QDA for classification • Drawback: calculation intensive
Introduction Theory Application Aspects Simulation Study Summary Method 2:
Introduction Theory Application Aspects Simulation Study Summary Criticism on former simulation studies • Carried out 20-30 years ago • Out-dated parameter selectors • Restriction to uncorrelated normals • Fruitless estimation because of high dimensions • No dimension reduction