Identifying Competence-Critical Instances for Instance-Based Learners

Identifying Competence-Critical Instances for Instance-Based Learners 2001. 5. 9 Presenter: Kyu-Baek Hwang

Abstract • The basic nearest neighbor classifier with a large dataset • Classification accuracy and response time • Review on past works tackling these problems • No consistent method • Insight into the problem characteristics • Iterative case filtering (ICF) algorithm

Introduction • Harmful and superfluous instances are stored. • Selectively store instances (or delete stored instances) • The data miner have to gain an insight into the structure of the classes in the instance space. • The experimental comparison of RT3 and ICF • Neither algorithm performs better in all cases.

Defining the Problem • Two practical issues that arise in this area • Instance removal (retain only the critical instances) • Different approaches according to the type of the classification problem • The same (or higher) accuracy and the less storage • Which instance should be deleted?

Four Cases Where NNC Fails • Noisy instance • Close to the interclass border • Border instances are critical in general. • Small region defining the class • Small k values cope with this kind of problem. • Unsolvable problem

Instance Space Structure • Two categories of instance space structure • Homogeneous region (locality) • Non-homogeneous region (no locality)

Which Instances Are Critical? • Prototypes • For non-homogeneous regions • Instances with high utility • Needs classification feedback • Instances which lie on borders are almost always critical.

Review • Competence enhancement • By removing noisy or corrupt instances • Competence preservation • By removing superfluous instances • Hybrid approach • Many modern approaches

Competence Enhancement • Stochastic noise • Wilson Editing • All instances which are incorrectly classified by their nearest neighbors are assumed to be nosy instances. • Smoothing effect • Empirically tested • Noisy instances and genuine exceptions

Competence Preservation • Condensed nearest neighbor (CNN) • Look for cases for which removal does not lead to additional miss-classification • Chang’s algorithm (Korean) • Merging two instances into one synthetic instance (the prototype) • Footprint deletion policy • Local-set of a case c • The set of cases contained in the largest hypersphere centered on c such that only cases in the same class as c are contained in the hypersphere.

Footprint Deletion Policy • For a case-base CB = {c1, c2, …, cn} • Coverage(c) = {c’  CB: c’  Local-set(c)} • Reachable(c) = {c’  CB: c  Local-set(c’)} • Pivotal group • With an empty reachable set • Delete the instance with large local-set

Hybrid Approaches (1/2) • IB2 (on-line) • If a new case to be added can already be classified correctly on the basis of the current case-base, the case is discarded. • IB3 • IB2 with time delay • The order of presentation is crucial for IB2 and IB3. • RT1 • k nearest neighbor • Associates of the case p are the cases that have p as their k nearest neighbor. • The instance which has many associates is tested and removed.

Hybrid Approaches (2/2) • RT2 is identical to RT1 and additionally, • Cases furthest from their nearest enemy are removed first. • Removed associates still guide the deletion process. • RT3 is identical to RT2 and additionally, • Wilson’s noise filtering step is executed first. • RT algorithms are analogous to the footprint deletion policy.

An Iterative Case Filtering Algorithm • Coverage set and reachable set • RTn algorithm • Associate set of fixed size • Remove cases which have a reachable set size greater than the coverage set size. • Intuitively, this approach removes the cases that are far from the border. • A noisy case will have a singleton reachable set and a singleton coverage set. • This property protects the noisy case from being removed. • Wilson Editing

ICF Algorithm

How The ICF Algorithm Proceeds?

Experiments • Experiments on 30 datasets from UCI repository • Maximum number of iterations: 17 • switzerland database • In general, 3 iterations are required.

Reduction Profiles • The percentage of cases removed after each iteration • switzerland database: 17 iterations, 2 – 13% (complicated) • zoo database: 2 iterations, 37% (simple structure)

Comparative Evaluation • (1) Early approaches • CNN, RNN, SNN, Chang, Wilson Editing, repeated Wilson Editing, and all k-NN • (2) Recent editions • IB2, IB3, TIBLE, and IBL-MDL • (3) State of the art • RT3 and ICF

RT3 and ICF

Conclusions • The structure of the instance space is important. • ICF and RT3 behave in very similar way. • The intrinsic properties of them are similar. • 80% of removal and the little degradation of accuracy. • The reduction profile provides some insights into the property of the problem.

Identifying Competence-Critical Instances for Instance-Based Learners

Identifying Competence-Critical Instances for Instance-Based Learners

Presentation Transcript

Instance Based Learning

Instance-Based Learning

Instance Based Learning

Instance Based Learning

Instance Based Learning

Instance Based Learning

Instance-Based Learners

Instance Based Learning

Instance Based Approach

Instance Based Learning

Competence Guided Instance Selection for Case-based Reasoning

Instance-based Classification

Developing Learners’ Communicative Competence Through Task-Based Instruction

Process for Identifying English Learners

Instance based learning

Instance-Based Learning

Instance Based Learning

Instance-Based Learning

Instance Based Learning

Competence Guided Instance Selection for Case-based Reasoning

Instance Based Learning

Identifying Competence-Critical Instances for Instance-Based Learners