120 likes | 131 Views
Dive into the key differences between lazy and eager learning approaches in machine learning, focusing on instance-based learning. Explore practical issues such as similarity measures, normalization, and attribute weighting methods. Understand the nuances of kNN algorithms, instance weighting, and the impact of attribute selection on classification accuracy.
E N D
CS 6243 Machine Learning Instance-based learning
Lazy vs. Eager Learning • Lazy vs. eager learning • Eager learning (e.g., decision tree learning): Given a set of training set, constructs a classification model before receiving new (e.g., test) data to classify • Lazy learning (e.g., instance-based learning): Simply stores training data (or only minor processing) and waits until it is given a test tuple • Lazy: less time in training but more time in predicting
Nearest neighbor classifier - Basic idea • For each test case h • Find the k training instances that are closest to h • Return the most frequent class label
Practical issues • Similarity / distance function • Number of neighbors • Instance weighting • Attribute weighting • Algorithms / data structure to improve efficiency • Explicit concept generalization
Similarity / distance measure • Euclidean distance • City-block (Manhattan) distance • Dot product / cosine function • Good for high-dimensional sparse feature vectors • Popular in document classification / information retrieval • Pearson correlation coefficient • Measures linear dependency • Popular in biology • Nominal attributes: distance is set to 1 if values are different, 0 if they are equal
Normalization and other issues • Different attributes are measured on different scales need to be normalized: vi : the actual value of attribute i • Row normalization / column normalization • Common policy for missing values: assumed to be maximally distant (given normalized attributes) or witten&eibe
Number of neighbors • 1-NN is sensitive to noisy instances • In general, the larger the number of training instances, the larger the value of k • Can be determined by minimizing estimated classification error (using cross validation) • Search over K = (1,2,3,…,Kmax). Choose search size Kmax based on compute constraints • Estimate average classification error for each K • Pick K to minimize the classification error
Instance weighting • We might want to weight nearer neighbors more heavily • Each nearest neighbor cast its vote with a weight • Final prediction is the class with the highest sum of weights • In this case may use all instance (no need to choose k) • Shepard’s method • Can also do numerical prediction
Attribute weighting • Simple strategy: • Calculate correlation between attribute values and class labels • More relevant attributes have higher weights • More advanced strategy: • Iterative updating (IBk) • Slides for Ch6
Other issues • Algorithms / data structure to improve efficiency • Data structure to enable efficiently finding nearest neighbors: kD tree, ball tree • Does not affect classification results • Ch4 slides • Algorithms to select prototype • May affect classification results • IBk. Ch6 slides • Concept generalization • Should we do it or not do it? • Ch6 slides
Discussion of kNN • Pros: • Often very accurate • Easy to implement • Fast to train • Arbitrary decision boundary • Cons: • Classification is slow (remedy: ball tree, prototype selection) • Assumes all attributes are equally important (remedy: attribute selection or weights, but, still, curse of dimensionality) • No explicit knowledge discovery witten&eibe
Sec.14.6 Decision boundary _ y + x 12