1 / 12

CS 6243 Machine Learning

CS 6243 Machine Learning. Instance-based learning. Lazy vs. Eager Learning. Lazy vs. eager learning Eager learning (e.g., decision tree learning): Given a set of training set, constructs a classification model before receiving new (e.g., test) data to classify

abeni
Download Presentation

CS 6243 Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 6243 Machine Learning Instance-based learning

  2. Lazy vs. Eager Learning • Lazy vs. eager learning • Eager learning (e.g., decision tree learning): Given a set of training set, constructs a classification model before receiving new (e.g., test) data to classify • Lazy learning (e.g., instance-based learning): Simply stores training data (or only minor processing) and waits until it is given a test tuple • Lazy: less time in training but more time in predicting

  3. Nearest neighbor classifier - Basic idea • For each test case h • Find the k training instances that are closest to h • Return the most frequent class label

  4. Practical issues • Similarity / distance function • Number of neighbors • Instance weighting • Attribute weighting • Algorithms / data structure to improve efficiency • Explicit concept generalization

  5. Similarity / distance measure • Euclidean distance • City-block (Manhattan) distance • Dot product / cosine function • Good for high-dimensional sparse feature vectors • Popular in document classification / information retrieval • Pearson correlation coefficient • Measures linear dependency • Popular in biology • Nominal attributes: distance is set to 1 if values are different, 0 if they are equal

  6. Normalization and other issues • Different attributes are measured on different scales  need to be normalized: vi : the actual value of attribute i • Row normalization / column normalization • Common policy for missing values: assumed to be maximally distant (given normalized attributes) or witten&eibe

  7. Number of neighbors • 1-NN is sensitive to noisy instances • In general, the larger the number of training instances, the larger the value of k • Can be determined by minimizing estimated classification error (using cross validation) • Search over K = (1,2,3,…,Kmax). Choose search size Kmax based on compute constraints • Estimate average classification error for each K • Pick K to minimize the classification error

  8. Instance weighting • We might want to weight nearer neighbors more heavily • Each nearest neighbor cast its vote with a weight • Final prediction is the class with the highest sum of weights • In this case may use all instance (no need to choose k) • Shepard’s method • Can also do numerical prediction

  9. Attribute weighting • Simple strategy: • Calculate correlation between attribute values and class labels • More relevant attributes have higher weights • More advanced strategy: • Iterative updating (IBk) • Slides for Ch6

  10. Other issues • Algorithms / data structure to improve efficiency • Data structure to enable efficiently finding nearest neighbors: kD tree, ball tree • Does not affect classification results • Ch4 slides • Algorithms to select prototype • May affect classification results • IBk. Ch6 slides • Concept generalization • Should we do it or not do it? • Ch6 slides

  11. Discussion of kNN • Pros: • Often very accurate • Easy to implement • Fast to train • Arbitrary decision boundary • Cons: • Classification is slow (remedy: ball tree, prototype selection) • Assumes all attributes are equally important (remedy: attribute selection or weights, but, still, curse of dimensionality) • No explicit knowledge discovery witten&eibe

  12. Sec.14.6 Decision boundary _ y + x 12

More Related