1 / 13

CS 9633 Machine Learning k-nearest neighbor

CS 9633 Machine Learning k-nearest neighbor. Chapter 8: Instance-based learning. Adapted from notes by Tom Mitchell http://www-2.cs.cmu.edu/~tom/mlbook-chapter-slides.html. A classification of learning algorithms. Eager learning algorithms Neural networks Decision trees

argus
Download Presentation

CS 9633 Machine Learning k-nearest neighbor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 9633Machine Learningk-nearest neighbor Chapter 8: Instance-based learning Adapted from notes by Tom Mitchell http://www-2.cs.cmu.edu/~tom/mlbook-chapter-slides.html Computer Science Department CS 9633 KDD

  2. A classification of learning algorithms • Eager learning algorithms • Neural networks • Decision trees • Bayesian classifiers • Lazy learning algorithms • K-nearest neighbor • Case based reasoning Computer Science Department CS 9633 KDD

  3. General Idea of Instance-based Learning • Learning: store all the data instances • Performance: • when a new query instance is encountered • retrieve a similar set of related instances from memory • use to classify the new query Computer Science Department CS 9633 KDD

  4. Pros and Cons of Instance Based Learning • Pros • Can construct a different approximation to the target function for each distinct query instance to be classified • Can use more complex, symbolic representations • Cons • Cost of classification can be high • Uses all attributes (do not learn which are most important) Computer Science Department CS 9633 KDD

  5. k-nearest neighbor (knn) learning • Most basic type of instance learning • Assumes all instances are points in n-dimensional space • A distance measure is needed to determine the “closeness” of instances • Classify an instance by finding its nearest neighbors and picking the most popular class among the neighbors Computer Science Department CS 9633 KDD

  6. Important Decisions • Distance measure • Value of k (usually odd) • Voting mechanism • Memory indexing Computer Science Department CS 9633 KDD

  7. Euclidean Distance • Typically used for real valued attributes • Instance x (often called a feature vector) • Distance between two instances xi and xj Computer Science Department CS 9633 KDD

  8. Discrete Valued Target Function Training algorithm: For each training example <x, f(x), add the example to the list training_examples Classification algorithm: Given a query instance xq to be classified Let x1…xk be the k training examples nearest to xq Return Computer Science Department CS 9633 KDD

  9. Continuous valued target function • Algorithm computes the mean value of the k nearest training examples rather than the most common value • Replace fine line in previous algorithm with Computer Science Department CS 9633 KDD

  10. Training dataset Computer Science Department CS 9633 KDD

  11. k-nn • K = 3 • Distance • Score for an attribute is 1 for a match and 0 otherwise • Distance is sum of scores for each attribute • Voting scheme: proportionate voting in case of ties Computer Science Department CS 9633 KDD

  12. Test Set Computer Science Department CS 9633 KDD

  13. Algorithm questions • What is the space complexity for the model? • What is the time complexity for learning the model? • What is the time complexity for classification of an instance? Computer Science Department CS 9633 KDD

More Related