240 likes | 429 Views
Chapter 8 Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering. Nearest Neighbor Approaches. Based on the concept of similarity Memory-Based Reasoning (MBR) – results are based on analogous situations in the past
E N D
Chapter 8Nearest Neighbor Approaches:Memory-Based Reasoning and Collaborative Filtering
Nearest Neighbor Approaches • Based on the concept of similarity • Memory-Based Reasoning (MBR) – results are based on analogous situations in the past • Collaborative Filtering – results use preferences in addition to analogous situations from the past
Memory-Based Reasoning (MBR) • Our ability to reason from experience depends on our ability to recognize appropriate examples from the past… • Traffic patterns/routes • Movies • Food • We identify similar example(s) and apply what we know/learned to current situation • These similar examples in MBR are referred to as neighbors
MBR Applications • Fraud detection • Customer response prediction • Medical treatments • Classifying responses – MBR can process free-text responses and assign codes
MBR Strengths • Ability to use data “as is” – utilizes both a distance function and a combination function between data records to help determine how “neighborly” they are • Ability to adapt – adding new data makes it possible for MBR to learn new things • Trivially an incremental learner • Good results without lengthy training
MBR Example – Rents in Tuxedo, NY • Classify nearest neighbors based on descriptive variables – population & median home prices (not geography in this example) • Range midpoint in 2 neighbors is $1,000 & $1,250 so Tuxedo rent should be $1,125; 2nd method yields rent of $977 • Actual midpoint rent in Tuxedo turns out to be $1,250 (one method) and $907 in another.
MBR Challenges • Choosing appropriate historical data for use in training • Choosing the most efficient way to represent the training data • Choosing the distance function, combination function, and the number of neighbors
Instance-Based Classifiers • Store the training records • Use training records to predict the class label of unseen cases
Instance Based Classifiers • Examples: • Rote-learner • Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly • No generalization • Nearest neighbor • Uses k “closest” points (nearest neighbors) for performing classification
Compute Distance Test Record Training Records Choose k of the “nearest” records Nearest Neighbor Classifiers • Basic idea: • If it walks like a duck, quacks like a duck, then it’s probably a duck
Nearest-Neighbor Classifiers • Requires three things • The set of stored records • Distance metric to compute distance between records • The value of k, the number of nearest neighbors to retrieve • To classify an unknown record: • Compute distance to other training records • Identify k nearest neighbors • Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote)
Definition of Nearest Neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x
1 nearest-neighbor Vornoi Diagram Can you figure out what this is? Is 1-NN more expressive than a Decision Tree?
Nearest Neighbor Classification • Compute distance between two points: • Euclidean distance • Determine the class from nearest neighbor list • take the majority vote of class labels among the k-nearest neighbors • Weigh the vote according to distance • weight factor, w = 1/d2
Nearest Neighbor Classification… • Choosing the value of k: • If k is too small, sensitive to noise points • If k is too large, neighborhood may include points from other classes
Nearest Neighbor Classification… • Scaling issues • Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes • Example: • height of a person may vary from 1.5m to 1.8m • weight of a person may vary from 90lb to 300lb • income of a person may vary from $10K to $1M
Nearest Neighbor Classification… • Problem with Euclidean measure: • High dimensional data • curse of dimensionality • Can produce counter-intuitive results if space is very sparse 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 vs 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 d = 1.4142 d = 1.4142 One solution: Ignore features where both feature values are 0 We discussed this when talking about Data for Chapter 2
Nearest neighbor Classification… • k-NN classifiers are lazy learners • It does not build models explicitly • Unlike eager learners such as decision tree induction and rule-based systems • Classifying unknown records are relatively expensive
Remarks on Lazy vs. Eager Learning • Instance-based learning: lazy evaluation • Decision-tree and Rule-Based classification: eager evaluation • Key differences • Lazy method may consider query instance xq when deciding how to generalize beyond the training data D • Eager method cannot since they have already chosen global approximation when seeing the query • Efficiency: Lazy - less time training but more time predicting • Accuracy • Lazy method effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function • Eager: must commit to a single hypothesis that covers the entire instance space
Collaborative Filtering • Lots of human examples of this: • Best teachers • Best courses • Best restaurants (ambiance, service, food, price) • Recommend a dentist, mechanic, PC repair, blank CDs/DVDs, wines, B&Bs, etc… • CF is a variant of MBR particularly well suited to personalized recommendations
Collaborative Filtering • Starts with a history of people’s personal preferences • Uses a distance function – people who like the same things are “close” • Uses “votes” which are weighted by distances, so close neighbor votes count more • Basically, judgments of a peer group are important
Collaborative Filtering • Knowing that lots of people liked something is not sufficient… • Who liked it is also important • Friend whose past recommendations were good (or bad) • High profile person seems to influence • Collaborative Filtering automates this word-of-mouth everyday activity
Preparing for Collaborative Filtering • Building customer profile – ask new customer to rate selection of things • Comparing this new profile to other customers using some measure of similarity • Using some combination of the ratings from similar customers to predict what the new customer would select for items he/she has NOT yet rated
Collaborative Filtering Example • What rating would Nathaniel give to Planet of the Apes? • Simon, distance 2, rated it -1 • Amelia, distance 4, rated it -4 • Using weighted average inverse to distance, it is predicted that he would rate it a -2 • =(0.5*-1 + 0.25*-4) / (0.5 + 0.25) • Nathaniel can certainly enter his rating after seeing the movie which could be close or far from the prediction