The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method

The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method C.H Chou, B.H Kuo & F Chang Institute of Information Science, Academia Sinica, Taipei, Taiwan ICPR 2006

Abstract • we propose a new data reduction algorithm GCNN( Generalized Condensed Nearest Neighbor ) . • It is called the GCNN algorithm because it has weak criterion employed than CNN. • GCNN, moreover, can yield significantly better accuracy than other instance-based data reduction methods. • We demonstrate the last claim through experiments on five datasets, some of which contain a very large number of samples.

Introduction • Most data reduction schemes are based on certain prototype learning methods which can be divided into two types. 1. Instance-based learning (IBL) algorithms - CNN, GCNN 2. Clustering-based learning (CBL) algorithms - k-means clustering algorithm - fuzzy c-means algorithm

Advantage of GCNN • It incorporates CNN as a special case and can outperform CNN. • Under certain conditions, GCNN is consistent. • One of the above conditions requires that any two sets of data with different labels have a positive separation. • GCNN creates prototypes for all labels simultaneously, in contrast to SVM, which creates support vectors for one pair of labels at a time.

positive distance VS margin

CNN Algorithm • Our goal is to extract a subset Un from Xn such that if u is the nearest member of Un to xi, then l(u) = yi, where l(u)is the label of u. • Members of Un are called prototypes set. • u is called prototype. • Samples that match in label with their nearest prototypes are said to be absorbed. • Two labeled entities are homogeneous if they have the same label, and heterogeneous otherwise.

CNN Algorithm • For CNN, a sample x is absorbed if • (1) • where p and q are prototypes: p is the nearest homogeneous prototype to x, and q is the nearest heterogeneous prototype to x.

GCNN Algorithm • For GCNN, a sample x is absorbed if • (2) • We say that a sample is weakly absorbed if it satisfies (1), and strongly absorbed if it satisfies (2). Note that (1) corresponds to the case when ρ=0 in (2) ρδn

GCNN Algorithm • S1 Initiation: For each label y, randomly select a y-sample as a new y-prototype. • S2 Absorption Check: Check whether all samples have been strongly absorbed. If so, terminate the process; otherwise, proceed to the next step. • S3 Prototype Augmentation: For each y, if there are any unabsorbed y-samples, randomly select one as a new y-prototype; otherwise, no new prototype is added to label y. Proceed to step S2.

KNN

CNN

GCNN

GCNN VS. CNN

Datasets

Experimental Results

The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method

The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method

Presentation Transcript

K-nearest neighbor methods

Nearest Neighbor Classifiers

Reverse Nearest Neighbor Aggregates

Nearest-Neighbor Classifiers

Nearest Neighbor

Nearest neighbor matching

Nearest-Neighbor Classifiers

Pairwise Nearest Neighbor Method Revisited Parittainen yhdistelymenetelmä uudistettuna

Classification Nearest Neighbor

Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects

The Nearest-Neighbor Classifier

Nearest Neighbor

K nearest neighbor

Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor

Alhazen and the Nearest Neighbor Rule

Exact Nearest Neighbor Algorithms

K-Nearest Neighbor

K-Nearest Neighbor Learning

A vector quantization method for nearest neighbor classifier design

Classification Nearest Neighbor

Learning: Nearest Neighbor

Nearest Neighbor Classifier