CSC 196k Semester Project: Instance Based Learning

CSC 196k Semester Project: Instance Based Learning Weka Assignment 2 Glynis Hawley

Agenda • Background: Instance-based Learning • Project • Requirements • Data • Progress • Conclusions • References

Background: Instance Based Learning • Learning/classification based on information stored in a “set” of examples • No rules or decision trees • “New” instance classified based on its similarity to one (or more) stored example(s) • e.g. Nearest-neighbor

IBL Algorithm research by David W. Aha • Two papers helpful in understanding this assignment • Instance-based Learning Algorithms • David W. Aha , Dennis Kibler, Marc K. Albert • 1991 • Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms • David W. Aha • 1992 • Three algorithms: IB1, IB2, IB3

IB1:Instance-Based Learnerversion 1 • Similar to nearest neighbor algorithm • Differences: • Normalizes all attributes in range [0,1] • Handles missing attributes • Training: Stores all instances from training set • Classification: Searches all stored instances for nearest neighbor. • High computational and spatial expense

IB2:Instance-Based Learnerversion 2 • Attempts to reduce storage requirements and computational complexity • Saves only misclassified instances • Algorithm: Stored instances = {}; For each instance in the training set, Tentatively classify the instance based on nearest stored instance. If classification != true class Add the instance to the stored set • Tends to accumulate noisy instances

IB3: Instance-Based Learnerversion 3 • Tracks the performance of each exemplar • Uses only those that are “good enough” • Performance exceeds some upper threshold • Discards those that are “not good enough” • Performance falls below some lower threshold • Exemplars “in between” • Performance statistics upgraded whenever exemplar is the nearest neighbor to a “new” instance • Performance and storage better that IB1 and IB2

Aha’s Results Results are averaged over 50 trials. [1:274], [2:57]

The Weka IBL Project • Implement IB2 and IB3 • Compare their performance with that of IB1 and C4.5 (Weka version is called J48) • Data • Iris data: for initial testing of IB2 • LED data • Glass data

LED Dataset • Synthetic dataset created with led-creator.c [3] • 8 attributes • 7 segments of display: 0 or 1 • Class: digits 0 through 9 • Input • Number of instances to be created • Seed • % noise per attribute • 10% noise means each bit has a 10% chance of being flipped

Glass Identification Dataset • 214 instances • 163 Window glass (building windows and vehicle windows) • 87 float processed • 70 building windows • 17 vehicle windows • 76 non-float processed • 76 building windows • 0 vehicle windows • 51 Non-window glass • 13 containers • 9 tableware • 29 headlamps

Progress Report - Accomplished • Implemented IB2 • Modification of IB1 class methods • buildClassifier( ) • updateClassifier( ) • Preliminary testing with iris data • Compared accuracy of IB1, IB2, and C4.5 on LED data • 10 sets of 700 instances each with 10% noise • training set = first 200 instances of each set • testing set = last 500 instances of each set

Compare David Aha’s results [2:52] (over 50 trials): IB1: 70.5  0.4 % IB2: 62.5  0.6 %

Progress Report - To Do • Implement IB3 • More involved than IB2 • Even more difficult when you don’t know java • Test accuracy of IB3 on LED data to compare with that of IB1, IB2, and C4.5 • Test accuracy of IB1, IB2, IB3, and C4.5 on the glass data

Conclusions • Thus far, comparisons of IB1 and IB2 are similar to David Aha’s results. • Weka assignments (except perhaps #1) • Are somewhat vague. • Require some research to determine what actual project requirements should be. • Are valuable in building an understanding of the algorithms and their design.

References [1] Aha, David W. 1992. Training noisy, irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies 36(2):267-287. [2] Aha, David W., Dennis Kibler and Marc Albert. 1991. Instance-based learning algorithms. Machine Learning 6:37-66. [3] http://ftp.ics.uci.edu/pub/machine-learning-databases/led-display-creator/led-creator.c

CSC 196k Semester Project: Instance Based Learning