180 likes | 684 Views
Chapter 8 Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering. Data Mining Techniques So Far…. Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks. Nearest Neighbor Approaches. Based on the concept of similarity
E N D
Chapter 8Nearest Neighbor Approaches:Memory-Based Reasoning and Collaborative Filtering
Data Mining Techniques So Far… • Chapter 5 – Statistics • Chapter 6 – Decision Trees • Chapter 7 – Neural Networks
Nearest Neighbor Approaches • Based on the concept of similarity • Memory-Based Reasoning (MBR) – results are based on analogous situations in the past • Collaborative Filtering – results use preferences in addition to analogous situations from the past
Memory-Based Reasoning (MBR) • Our ability to reason from experience depends on our ability to recognize appropriate examples from the past… • Traffic patterns/routes • Movies • Food • We identify similar example(s) and apply what we know/learned to current situation • These similar examples in MBR are referred to as neighbors
MBR Applications • Fraud detection • Customer response prediction • Medical treatments • Classifying responses – MBR can process free-text responses and assign codes
MBR Strengths • Ability to use data “as is” – utilizes both a distance function and a combination function between data records to help determine how “neighborly” they are • Ability to adapt – adding new data makes it possible for MBR to learn new things • Good results without lengthy training
MBR Example – Rents in Tuxedo, NY • Classify nearest neighbors based on descriptive variables – population & median home prices (not geography in this example) • Range midpoint in 2 neighbors is $1,000 & $1,250 so Tuxedo rent should be $1,125; 2nd method yields rent of $977 • Actual midpoint rent in Tuxedo turns out to be $1,250 (one method) and $907 in another.
MBR Challenges • Choosing appropriate historical data for use in training • Choosing the most efficient way to represent the training data • Choosing the distance function, combination function, and the number of neighbors
Memory-Based Reasoning Exercise • Work in teams of 3 or 4 • Time Limit = 10 minutes • Discuss a couple of ways in which MBR could be utilized and hence useful to an organization (enterprise, govt agency, etc.) • Teams present ideas
Collaborative Filtering • Lots of human examples of this: • Best teachers • Best courses • Best restaurants (ambiance, service, food, price) • Recommend a dentist, mechanic, PC repair, blank CDs/DVDs, wines, B&Bs, etc… • CF is a variant of MBR particularly well suited to personalized recommendations
Collaborative Filtering • Starts with a history of people’s personal preferences • Uses a distance function – people who like the same things are “close” • Uses “votes” which are weighted by distances, so close neighbor votes count more • Basically, judgments of a peer group are important
Collaborative Filtering • Knowing that lots of people liked something is not sufficient… • Who liked it is also important • Friend whose past recommendations were good (or bad) • High profile person seems to influence • Collaborative Filtering automates this word-of-mouth everyday activity
Preparing Recommendations for Collaborative Filtering • Building customer profile – ask new customer to rate selection of things • Comparing this new profile to other customers using some measure of similarity • Using some combination of the ratings from similar customers to predict what the new customer would select for items he/she has NOT yet rated
Collaborative Filtering Example • What rating would Nathaniel give to Planet of the Apes? • Simon, distance 2, rated it -1 • Amelia, distance 4, rated it -4 • Using weighted average inverse to distance, it is predicted that he would rate it a -2 • =(0.5*-1 + 0.25*-4) / (0.5 + 0.25) • Nathaniel can certainly enter his rating after seeing the movie which could be close or far from the prediction
Collaborative Filtering Exercise • Work in teams of 3 or 4 • Time Limit = 10 minutes • Discuss a couple of ways in which Collaborative Filtering could be utilized and hence useful to an organization (enterprise, govt agency, etc.) • Teams present ideas