Topics and Transitions: Investigation of User Search Behavior

Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz

What’s next for the user?

Outline • Problem • Automatic Topic Tagging • Predictive models • Evaluation • Experiments and analysis • Conclusion and future directions

Problem • Opportunity: Personalizing search • Focus: What topics do users explore? • How similar are users to each other, to special groups, and to the population at large? • Data, data, data… • MSN search engine log • Query & clickthrough • 87,449,277 rows, 36,895,634 URLs 5% sample from MSN logs, 05/29-06/29 • Create predictive models of topic of queries and urls visited

Automatic Topic Tagging • ODP (Open Directory Project) manually categorize URLs • MSN extended methods with heuristics to cover more urls • We develop a tool to automatically tag every URL in the log 15 top-level categories Arts, Business, Computers, Games, Health, Home, Kids_and_Teens, News, Recreation, Reference, Science, Shopping, Society, Sports, Adult

multiple tagging Avg: 1.38 tags per URL A Snippet

Predictive Model: User Perspective • Individual model Use only individual clickthrough to build a model for each user’s predictions • Group model Group similar users to build a model for each group’s prediction (e.g., group users with same ‘max topic’ clickthrough) • Population model Use clickthrough data for all users to build a model for all users predictions

? ? ? Predictive Model: Considering Time Dependence • Marginal model • Base probability for topics • Markov model • Probability of moving from one topic to another • Time-interval-specific Markov model • User search behavior has two different patterns

Evaluation Metrics • KL (Kullback-Leibler) Divergence • Likelihood • Top K Match the real top K topics and predicted top K’ topics

Experiment • 5 weeks data (05/22-06/29) • Build models based on different subsets of total data • Do prediction for a “holdout set”: Other weeks data

Results from Basic Experiment Marginal model: Individual model has best performance Markov model: Consistently better than corresponding marginal model Markov model: Individual model has no best performance: Why?

Results: Training Data Size Greater amounts of training data  Markov (same for Marginal) models improve But: Individual Markov model still can’t beat Population Markov model

Results: Smoothing Using population Markov model to smooth helps individual Markov model But: smoothed individual Markov model still can’t outperform population model

Results: Time Decay Effect When time of training data decays, the prediction accuracy decreases

Results: Time-Interval-Specific Markov Model Markov Models capture short time access pattern better

Conclusion • Use ODP categorization to tag URLs visited by users • Construct marginal and Markov models using tagged URLs • Explore performance of marginal and Markov models to predict transitions among topics • Set of results relating topic transition behaviors of population, groups, and specific users

Directions • Study of reliability, failure modes of automated tagging process (use of expert human taggers) • Combination of query and clickthrough topics • Formulating and studying different groups of people • Topic-centric evaluation • Application of results in personalization of search experience • Interpretation of topics associated with queries • Ranking of results • Designs for client UI

Acknowledgement • Susan and Eric for great mentoring and discussion • Johnson and Muru for development support • Haoyong for MSN Search Engine development environment

Topics and Transitions: Investigation of User Search Behavior

Topics and Transitions: Investigation of User Search Behavior

Presentation Transcript

Search Engine Optimization Basics

Neuroendocrinology of Courtship Behavior

SharePoint Search

Towards Effective Preschool to Primary School Transitions

Elements of Consumer Behavior, and Defining the Market Hierarchy

Behavior Based Safety

Functional Behavior Assessment Day 1

User Interfaces for Information Access

System Planning (Preliminary Investigation Overview)

Accident Reporting, Investigation and Analysis

Ethics in Applied Behavior Analysis and Positive Behavior Supports

Principles of Behavior Modification (PSY 333)

Investigation of Primary User Emulation Attack in Cognitive Radio Networks

Circular Dichroism

FIRE INVESTIGATION

First Order vs Second Order Transitions in Quantum Magnets

Search Patterns

Statistical Models for Web Search Click Log Analysis

Fast Image Search

Architecture Topics

9.2 TRAFFIC - ACCIDENT INVESTIGATION

Behavior Based Safety