Nearest Neighbor Classifiers

Nearest Neighbor Classifiers • other names: • instance-based learning • case-based learning (CBL) • non-parametric learning • model-free learning

1-NN • save all training data • to classify a test example, • compute distance to each training example • Euclidean distance metric • report same class of nearest training example • for binary attributes, use Hamming distance • for nominal attributes, use equality (0 if equal, else 1) or VDM (Value-Difference Metric; Stanfill and Waltz, 1986) – difference of conditional probabilities squared, summed over classes • Result: often surprisingly good accuracy, comparable with decision trees & neural nets

k-NN • sensitivity to noise • take majority over k closest neighbors • optimizing k: use validation set • distance-weighting • can use all training examples

strengths of k-NN • simple, accurate • Theorem: In the limit (large N), the error of 1-NN is at most twice the error of the Bayes-optimal classifier (Cover & Hart, 1967) • weaknesses of k-NN • memory needed to store examples • classification speed (indexing can help) • no comprehensibility • (noise, curse of dimensionality, lack of adequate training examples) • basis for generalization • bias: similarity bias

NTGrowth (Aha and Kibler) • during training, save only those examples on which mistakes are made • also throw out examples that appear noisy • reduces memory requirements, increases accuracy

Scaling of attributes • for fairness, don’t want large values to dominate • pre-whiten data: • for continuous values, replace with z-scores, z=(x-m)/s • binary and nominal attributes are already on scale of 0-1

Feature Weighting • weighted Euclidean dist. metric • want to weight features by “relevance” • conditional probability • negEntropy • chi-squared • Mahalanobis metric • inverse of covariance matrix, dxy=(x-y)TS-1(x-y) • capture skewing of data distribution • con: class-independent

Feature Selection • curse of dimensionality – many attributes often leads to lower accuracy • PCA – principle component analysis • based on manipulation of covariance matrix • choose new orthogonal dimensions based on linear combinations of original attributes, chosen in order of most variance explained • filter methods: try to estimate relevance • negEntropy, RELIEF: hits vs. misses of neighbors • wrapper methods (use accuracy on training data to pick best features) • SFS: stepwise-forward selection • SBE: stepwise-backward elimination • DIET: try optimizing weights of one feature at a time by searching a grid

Nearest Neighbor Classifiers

Nearest Neighbor Classifiers

Presentation Transcript

K-nearest neighbor methods

K-Nearest Neighbor Learning

Reverse Nearest Neighbor Aggregates

Nearest-Neighbor Classifiers

Nearest Neighbor

Nearest neighbor matching

Nearest-Neighbor Classifiers

Optimized Nearest Neighbor Methods

Minimum Mean Distance and k-Nearest Neighbor Classifiers for Signal Processing

Classification Nearest Neighbor

Nearest-neighbor model for Tm

Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects

The Nearest-Neighbor Classifier

Nearest Neighbor

K nearest neighbor

Exact Nearest Neighbor Algorithms

K-Nearest Neighbor

K-Nearest Neighbor Learning

Classification Nearest Neighbor

Learning: Nearest Neighbor

Nearest Neighbor Classifier

Classification Nearest Neighbor