Evaluating Similarity Measures: A Large-Scale Study in the orkut Social Network

Evaluating Similarity Measures: A Large-Scale Study in the orkutSocial Network Ellen Spertus spertus@google.com

Recommender systems • What are they? • Example: Amazon

Controversial recommenders • “What to do when your TiVo thinks you’re gay”, Wall Street Journal, Nov. 26, 2002 http://tinyurl.com/2qyepg

Controversial recommenders • Wal-Mart DVD recommendations http://tinyurl.com/2gp2hm

Google’s mission • To organize the world's information and make it universally accessible and useful.

communities

Community recommender • Goal: Per-community ranked recommendations • How to determine?

Community recommender • Goal: Per-community ranked recommendations • How to determine? • Implicit collaborativefiltering • Look for common membership between pairs of communities

Terminology • Consider each community to be a set of members • B: base community (e.g., “Pizza”) • R: related community (e.g., “Cheese”) • Similarity measure • Based on overlap |B∩R|

Example: Pizza

Terminology • Consider each community to be a set of members • B: base community (e.g., “Wine”) • R: related community (e.g., “Linux”) • Similarity measure • Based on overlap |B∩R| • Also depends on |B| and |R| • Possibly asymmetric

Example of asymmetry

Similarity measures • L1 normalization • L2 normalization • Pointwise mutual information • Positive correlations • Positive and negative correlations • Salton tf-idf • Log-odds

L1 normalization • Vector notation • Set notation

L2 normalization • Vector notation • Set notation

Mutual information: positive correlation • Formally, • Informally, how well membership in the base community predicts membership in the related community

Mutual information: positive and negative correlation

Salton tf-idf

LogOdds0 • Formally, • Informally, how much likelier a member of B is to belong to R than a non-member of B is.

LogOdds0 • Formally, • Informally, how much likelier a member of B is to belong to R than a non-member of B is. • This yielded the same rankings as L1.

LogOdds

Predictions? • Were there significant differences among the measures? • Top-ranked recommendations • User preference • Which measure was “best”? • Was there a partial or total ordering of measures?

Recommendations for “I love wine” (2400)

Experiment • Precomputed top 12 recommendations for each base community for each similarity measure • When a user views a community page • Hash the community and user ID to • Select an ordered pair of measures to • Interleave, filtering out duplicates • Track clicks of new users

Click interpretation

Overall click rate (July 1-18) Total recommendation pages generated: 4,106,050

Overall click rate (July 1-18)

Analysis • For each pair of similarity measures Ma and Mb and each click C, either: • Ma recommended C more highly than Mb • Ma and Mb recommended C equally • Mb recommended C more highly than Ma

Results • Clicks leading to joins L2» MI1 » MI2 » IDF › L1 » LogOdds • All clicks L2» L1 » MI1 » MI2 › IDF» LogOdds

Positional effects • Original experiment • Ordered recommendations by rank • Second experiment • Generated recommendations using L2 • Pseudo-randomly ordered recommendations, tracking clicks by placement • Tracked 1.3 M clicks between September 22-October 21

Results: single row (n=28108) Namorado Para o Bulldog

Results: single row (n=28,108) p=.12 (not significant)

Results: two rows (n=24,459)

Results: two rows (n=24,459) p < .001

Results: 3 rows (n=1,226,659)

Results: 3 rows (n=1,226,659) p < .001

Users’ reactions • Hundreds of requests per day to add recommendations • Angry requests from community creators • General • Specific

Amusing recommendations C++

Amusing recommendations C++ What’s she trying to say? For every time a woman has confused you…

Amusing recommendations Chocolate

Amusing recommendations Chocolate PMS

Allowing community owners to set recommendations

Evaluating Similarity Measures: A Large-Scale Study in the orkut Social Network

Evaluating Similarity Measures: A Large-Scale Study in the orkut Social Network

Presentation Transcript

Surviving Large Scale Internet Outages

The Cloud Resolving Storm Simulator: Large-scale Parallel Computations

Chapter 6: Link Analysis

Large Scale Integrated Circuits

Large-Scale Financial Risk Management Services

Large-Scale Copy Detection

Introduction to Large Scale Modeling Systems

Chapter 7: Social Network Analysis

Understanding and Managing Cascades on Large Graphs

Scalability and Efficiency Challenges in Large-Scale Web Search Engines

Understanding and Managing Cascades on Large Graphs

CS598Visual Information retrieval

Meteorology

Overview of Peter D. Turney’s Work on Similarity

Dynamical Processes on Large Networks

Activator

Contents

Phenetics vs. Cladistics

Social Commerce

Network Society - The distributed social organization replacing the nation state