1 / 8

Nearest Neighbor Classifiers

Nearest Neighbor Classifiers. other names: instance-based learning case-based learning (CBL) non-parametric learning model-free learning. 1-NN. save all training data to classify a test example, compute distance to each training example Euclidean distance metric

mindy
Download Presentation

Nearest Neighbor Classifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nearest Neighbor Classifiers • other names: • instance-based learning • case-based learning (CBL) • non-parametric learning • model-free learning

  2. 1-NN • save all training data • to classify a test example, • compute distance to each training example • Euclidean distance metric • report same class of nearest training example • for binary attributes, use Hamming distance • for nominal attributes, use equality (0 if equal, else 1) or VDM (Value-Difference Metric; Stanfill and Waltz, 1986) – difference of conditional probabilities squared, summed over classes • Result: often surprisingly good accuracy, comparable with decision trees & neural nets

  3. k-NN • sensitivity to noise • take majority over k closest neighbors • optimizing k: use validation set • distance-weighting • can use all training examples

  4. strengths of k-NN • simple, accurate • Theorem: In the limit (large N), the error of 1-NN is at most twice the error of the Bayes-optimal classifier (Cover & Hart, 1967) • weaknesses of k-NN • memory needed to store examples • classification speed (indexing can help) • no comprehensibility • (noise, curse of dimensionality, lack of adequate training examples) • basis for generalization • bias: similarity bias

  5. NTGrowth (Aha and Kibler) • during training, save only those examples on which mistakes are made • also throw out examples that appear noisy • reduces memory requirements, increases accuracy

  6. Scaling of attributes • for fairness, don’t want large values to dominate • pre-whiten data: • for continuous values, replace with z-scores, z=(x-m)/s • binary and nominal attributes are already on scale of 0-1

  7. Feature Weighting • weighted Euclidean dist. metric • want to weight features by “relevance” • conditional probability • negEntropy • chi-squared • Mahalanobis metric • inverse of covariance matrix, dxy=(x-y)TS-1(x-y) • capture skewing of data distribution • con: class-independent

  8. Feature Selection • curse of dimensionality – many attributes often leads to lower accuracy • PCA – principle component analysis • based on manipulation of covariance matrix • choose new orthogonal dimensions based on linear combinations of original attributes, chosen in order of most variance explained • filter methods: try to estimate relevance • negEntropy, RELIEF: hits vs. misses of neighbors • wrapper methods (use accuracy on training data to pick best features) • SFS: stepwise-forward selection • SBE: stepwise-backward elimination • DIET: try optimizing weights of one feature at a time by searching a grid

More Related