Kernel Density Estimation

Kernel Density Estimation Theory and Application in Discriminant Analysis Thomas Ledl Universität Wien

Contents: • Introduction • Theory • Aspects of Application • Simulation Study • Summary

Introduction

0 1 2 3 4 Introduction Theory Application Aspects Simulation Study Summary 25 observations: Which distribution? Introduction

0 1 2 3 4 ? ? ? ? ?

0 1 2 3 4 Introduction Theory K(.) and h to choose Application Aspects Simulation Study Summary Kernel density estimator model:

0 1 2 3 4 kernel/ bandwidth: „large“ h „small“ h triangular gaussian

Introduction Theory Application Aspects Simulation Study Summary Question 1: Which choice of K(.) and h is the best for a descriptive purpose?

Introduction Theory Application Aspects Simulation Study Summary Classification: Introduction

Introduction Theory Application Aspects Simulation Study Summary Classification: Levelplot – LDA (based on assumption of a multivariate normal distribution): Introduction

Introduction Theory Application Aspects Simulation Study Summary Classification: Introduction

Introduction Theory Application Aspects Simulation Study Summary Classification: Levelplot – KDE classificator: Introduction

Introduction Theory Application Aspects Simulation Study Summary Question 2: Performance of classification based on KDE in more than 2 dimensions? Introduction

Theory

Introduction Theory Application Aspects Simulation Study Summary Essential issues • Optimization criteria • Improvements of the standard model • Resulting optimal choices of the model parameters K(.) and h

Introduction Theory Application Aspects Simulation Study Summary Optimization criteria Lp-distances:

Introduction Theory Application Aspects Simulation Study Summary f(.) g(.)

Introduction Theory Application Aspects Simulation Study Summary

=IAE „Integrated absolute error“ =ISE „Integrated squared error“ Introduction Theory Application Aspects Simulation Study Summary

Minimization of the maximum vertical distance Introduction Theory Application Aspects Simulation Study Summary Other ideas: • Consideration of horizontal distances for a more intuitive fit (Marron and Tsybakov, 1995) • Compare the number and position of modes

L1-distance=IAE L-distance=Maximum difference „Modern“ criteria, which include a kind of measure of the horizontal distances L2-distance=ISE, MISE,AMISE,... Difficult mathematical tractability Does not consider overall fit Difficult mathematical tractability Introduction Theory Application Aspects Simulation Study Summary Overview about some minimization criteria • Most commonlyused

ISE is a random variable MISE=E(ISE), the expectation of ISE AMISE=Taylor approximation of MISE, easier to calculate Introduction Theory Application Aspects Simulation Study Summary ISE, MISE, AMISE,...

Introduction Theory Application Aspects Simulation Study Summary Essential issues • Optimization criteria • Improvements of the standard model • Resulting optimal choices of the model parameters K(.) and h

Introduction Theory Application Aspects Simulation Study Summary The AMISE-optimal bandwidth

dependent on the kernel function K(.) Introduction minimized by Theory „Epanechnikov kernel“ Application Aspects Simulation Study Summary The AMISE-optimal bandwidth

dependent on the unknown density f(.) Introduction Theory Application Aspects Simulation Study Summary The AMISE-optimal bandwidth How to proceed?

Maximum Likelihood Cross-Validation Least-squares cross-validation (Bowman, 1984) Leave-one-out selectors Criteria based on substituting R(f“) in the AMISE-formula Introduction Theory Application Aspects Simulation Study Summary Data-driven bandwidth selection methods • „Normal rule“ („Rule of thumb“; Silverman, 1986) • Plug-in methods (Sheather and Jones, 1991; Park and Marron,1990) • Smoothed bootstrap

Introduction Theory Application Aspects Simulation Study Summary Data-driven bandwidth selection methods Leave-one-out selectors • Maximum Likelihood Cross-Validation • Least-squares cross-validation (Bowman, 1984) Criteria based on substituting R(f“) in the AMISE-formula • „Normal rule“ („Rule of thumb“; Silverman, 1986) • Plug-in methods (Sheather and Jones, 1991; Park and Marron,1990) • Smoothed bootstrap

Introduction Theory Application Aspects Simulation Study Summary Least squares cross-validation (LSCV) • Undisputed selector in the 1980s • Gives an unbiased estimator for the ISE • Suffers from more than one local minimizer – no agreement about which one to use • Bad convergence rate for the resulting bandwidth hopt

Maximum Likelihood Cross-Validation Least-squares cross-validation (Bowman, 1984) Introduction Theory Application Aspects Simulation Study Summary Data-driven bandwidth selection methods Leave-one-out selectors Criteria based on substituting R(f“) in the AMISE-formula • „Normal rule“ („Rule of thumb“; Silverman, 1986) • Plug-in methods (Sheather and Jones, 1991; Park and Marron,1990) • Smoothed bootstrap

The resulting bandwidth is given by: Introduction Theory Application Aspects Simulation Study Summary Normal rule („Rule of thumb“) • Assumes f(x) to be N(,2) • Easiest selector • Often oversmooths the function

Maximum Likelihood Cross-Validation Least-squares cross-validation (Bowman, 1984) Introduction Theory Application Aspects Simulation Study Summary Data-driven bandwidth selection methods Leave-one-out selectors Criteria based on substituting R(f“) in the AMISE-formula • „Normal rule“ („Rule of thumb“; Silverman, 1986) • Plug-in methods (Sheather and Jones, 1991; Park and Marron,1990) • Smoothed bootstrap

Introduction Theory Application Aspects Simulation Study Summary Plug in-methods (Sheather and Jones, 1991; Park and Marron,1990) • Does not substitute R(f“) in the AMISE-formula, but estimates it via R(f(IV)) and R(f(IV)) via R(f(VI)),etc. • Another parameter i to chose (the number of stages to go back) – one stage is mostly sufficient • Better rates of convergence • Does not finally circumvent the problem of the unknown density, either

Introduction Theory Application Aspects Simulation Study Summary The multivariate case h H...the bandwidth matrix

Introduction Theory Application Aspects Simulation Study Summary Issues of generalization in d dimensions • d2 instead of one bandwidth parameter • Unstable estimates • Bandwidth selectors are essentially straightforward to generalize • For Plug-in methods it is „too difficult“ to give succint expressions for d>2 dimensions

Aspects of Application

Introduction Theory Application Aspects Simulation Study Summary Essential issues • Curse of dimensionality • Connection between goodness-of-fit and optimal classification • Two methods for discrimatory purposes

Introduction Theory Application Aspects Simulation Study d :a good fit in the tails is desired! Summary The „curse of dimensionality“  The data „disappears“ into the distribution tails in high dimensions

Introduction Theory Application Aspects Simulation Study Summary The „curse of dimensionality“  Much data is necessary to obey a constant estimation error in high dimensions

Optimal classification (in high dimensions) AMISE-optimal parameter choice • L2-optimal • L1-optimal (Misclassification rate) • Worse fit in the tails • Estimation of tails important • Calculation intensive for large n • Many observations required for a reasonable fit Essential issues

Introduction Theory Application Aspects Simulation Study Summary Method 1: • Reduction of the data onto a subspace which allows a somewhat accurate estimation, however does not destoy too much information  „trade-off“ • Use the multivariate kernel density concept to estimate the class densities

Introduction Theory Application Aspects Simulation Study Summary Method 2: • Use the univariate concept to „normalize“ the data nonparametrically • Use the classical methods like LDA and QDA for classification • Drawback: calculation intensive

Introduction Theory Application Aspects Simulation Study Summary Method 2:

Simulation Study

Introduction Theory Application Aspects Simulation Study Summary Criticism on former simulation studies • Carried out 20-30 years ago • Out-dated parameter selectors • Restriction to uncorrelated normals • Fruitless estimation because of high dimensions • No dimension reduction

Kernel Density Estimation

Kernel Density Estimation

Presentation Transcript

Probability density estimation

Kernel Density Estimation - concept and applications

Multiscale Analysis for Intensity and Density Estimation

Nonparametric density estimation or Smoothing the data

Effective measurement selection in truncated Kernel density estimator

Nonparametric density estimation or Smoothing the data

Introduction to Non Parametric Statistics Kernel Density Estimation

Applied Kernel Density Estimation: Dynamic Spatiotemporal Analysis of Density Maps on Crime Data

Kernel Density Estimation in Python

Estimation of the spectral density function

Pattern Classification via Density Estimation

Density Estimation

Kernel Density Estimation, Kernel Methods, and fast learning

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Probability density estimation

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Chapter 5 – Density estimation based on distances

Defect Density Estimation through Verification and Validation

Flexible templates, density estimation, mean shift