Modeling User Interactions in Web Search and Social Media

Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University

Intelligent Information Access Lab http://ir.mathcs.emory.edu/ • Research areas: • Information retrieval & extraction, text mining, and information integration • User behavior modeling, social networks and interactions, social media • People And colleagues at Yahoo! Research, Microsoft Research, Emory Libraries, Psychology, Emory School of Medicine, Neuroscience, and Georgia Tech College of Computing. • Support Walter Askew, EC‘09 Qi Guo, 2nd year Ph.D Yandong Liu, 2nd year Ph.D Alvin Grissom,2nd year MS Ryan Kelly, Emory’10 Abulimiti Aji, 1st Year Ph.D

User Interactions:The 3rd Dimension of the Web Amount exceeds web content and structure Published: 4Gb/day; Social Media: 10gb/Day Page views: 100Gb/day[Andrew Tomkins, Yahoo! Search, 2007]

Finding Information Online • Search + Browse • Orienteering • Ask • Offline  Online

Talk Outline • Web Search Interactions • Click modeling • Browsing • Social media • Content quality • User satisfaction • Ranking and Filtering

Interpreting User Interactions • Clickthrough and subsequent browsing behavior of individual users influenced by many factors • Relevance of a result to a query • Visual appearance and layout • Result presentation order • Context, history, etc. • General idea: • Aggregate interactions across all users and queries • Compute “expected” behavior for any query/page • Recover relevance signal for a given query

Case Study: Clickthrough Clickthrough frequency for all queries in sample Clickthrough (query q, document d, result position p)=expected (p) + relevance (q , d)

Clickthrough for Queries with Known Position of Top Relevant Result Higher clickthrough at top non-relevant than at top relevant document Relative clickthrough for queries with known relevant results in position 1 and 3 respectively

Model Deviation from “Expected” Behavior • Relevance component: deviation from “expected”: Relevance(q , d)= observed - expected (p)

Predicting Result Preferences • Task: predict pairwise preferences • A user will prefer Result A > ResultB • Models for preference prediction • Current search engine ranking • Clickthrough • Full user behavior model

Predicting Result Preferences: Granka et al., SIGIR 2005 • SA+N: “Skip Above” and “Skip Next” • Adapted from Joachims’ et al. [SIGIR’05] • Motivated by gaze tracking • Example • Click on results 2, 4 • Skip Above: 4 > (1, 3), 2>1 • Skip Next: 4 > 5, 2>3 1 2 3 4 5 6 7 8

Our Extension: Use Click Distribution • CD: distributional model, extends SA+N • Clickthrough considered iff frequency > εthan expected • Click on result 2 likely “by chance” • 4>(1,2,3,5), but not 2>(1,3) 1 2 3 4 5 6 7 8

Results: Click Deviation vs. Skip Above+Next

Problem: Users click based on result summaries/”captions”/”Snippets” Effect of Caption Features on Clickthrough Inversions, C. Clarke, E. Agichtien, S. Dumais, R. White, SIGIR 2007

Clickthrough Inversions

Relevance is Not the Dominant Factor!

Snippet Features Studied

Feature Importance

Important Words in Snippet

Summary • Clickthrough inversions are powerful tool for assessing the influence of caption features. • Relatively simple caption features can significantly influence user behavior. • Can help more accurately predicting relevance from clickthough by accounting for summary bias.

Idea: go beyond clickthrough/download counts

User Behavior Model • Full set of interaction features • Presentation, clickthrough, browsing • Train the model with explicit judgments • Input: behavior feature vectors for each query-page pair in rated results • Use RankNet (Burges et al., [ICML 2005]) to discover model weights • Output: a neural net that can assign a “relevance” score to a behavior feature vector

RankNet for User Behavior • RankNet: general, scalable, robust Neural Net training algorithms and implementation • Optimized for ranking– predicting an ordering of items, not scores for each • Trains on pairs (where first point is to be ranked higher or equal to second) • Extremely efficient • Uses cross entropy cost(probabilistic model) • Usesgradient descent to set weights • Restarts to escape local minima

RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Feature Vector1 Label1 NN output 1

RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Feature Vector2 Label2 NN output 1 NN output 2

RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Error is function of both outputs (Desire output1 > output2) NN output 1 NN output 2

RankNet [Burges et al. 2005] • Update feature weights: • Cost function: f(o1-o2) – details in Burges et al. paper • Modified back-prop Error is function of both outputs (Desire output1 > output2) NN output 1 NN output 2

Predicting with RankNet • Present individual vector and get score Feature Vector1 NN output

Example results: Predicting User Preferences • Baseline < SA+N < CD << UserBehavior • Rich user behavior features result in dramatic improvement

How to Use Behavior Models for Ranking? • Use interactions from previous instances of query • General-purpose (not personalized) • Only for the queries with past user interactions • Models: • Rerank, clickthrough only: reorder results by number of clicks • Rerank, predicted preferences (all user behavior features): reorder results by predicted preferences • Integrate directly into ranker: incorporate user interactions as features for the ranker

Enhance Ranker Features with User Behavior Features • For a given query • Merge original feature set with user behavior features when available • User behavior features computed from previous interactions with same query • Train RankNet [Burges et al., ICML’05] on the enhanced feature set

Feature Merging: Details • Value scaling: • Binning vs. log-linear vs. linear (e.g., μ=0, σ=1) • Missing Values: • 0? (meaning for normalized feats s.t. μ=0?) • Runtime: significant plumbing problems Query: SIGIR, fake results w/ fake feature values

Evaluation Metrics • Precision at K: fraction of relevant in top K • NDCG at K: norm. discounted cumulative gain • Top-ranked results most important • MAP: mean average precision • Average precision for each query: mean of the precision at K values computed after each relevant document was retrieved

Content, User Behavior: NDCG BM25 < Rerank-CT < Rerank-All < +All

Full Search Engine, User Behavior: NDCG, MAP

User Behavior Complements Content and Web Topology

Which Queries Benefit Most Most gains are for queries with poor ranking

Result Summary • Incorporating user behavior into web search ranking dramatically improves relevance • Providing rich user interaction features to ranker is the most effective strategy • Large improvement shown for up to 50% of test queries

User Generated Content

Some goals of mining social media Find high-quality content Find relevant and high quality content Use millions of interactions to Understand complex information needs Model subjective information seeking Understand cultural dynamics

http://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdOhttp://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdO

Lifecycle of a Question in CQA + + - + - + - User Choose a category Compose the question Open question Examine Answer Answer Answer Close question Choose best answers Give ratings Find the answer? Yes No Question is closed by system. Best answer is chosen by voters 43

Community

Modeling User Interactions in Web Search and Social Media

Modeling User Interactions in Web Search and Social Media

Presentation Transcript

Social Media and Search

Analyzing User Interactions for Data and User Modeling

Modeling Opinions and Beyond in Social Media

Web 2.0 and Social Media

Analyzing User Interactions for Data and User Modeling

Modeling User Interactions in Social Media

Social Media and Web 2.0

Web and Social Media Institute

Social Media Tools and Search

SOCIAL WEB MEDIA

SOCIAL WEB MEDIA

SOCIAL WEB MEDIA

User Experience Issues in Web Search

SOCIAL WEB MEDIA

Interactions, Quantum Diaries and Social Media

Interactions, Quantum Diaries and Social Media

SOCIAL WEB MEDIA

Web and Social Media Institute

Learning User Clicks in Web Search

SEARCH AND SOCIAL MEDIA

Modeling User Interactions in Web Search and Social Media

Web Content, Search Portals And Social Media Global Market