Evaluating Novelty and Diversity

two talks in one! Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo

Goals for Evaluation Measures • meaningful • tractable • reusable

Evaluation Framework We examine a framework for evaluation. Specific measures covered by the framework include: Clarke et al. (SIGIR ’08) Agrawal et al. (WSDM ’09) Clarke et al. (ICTIR ‘09)

Talk #1: Evaluating Diversity Charles Clarke School of Computer Science University of Waterloo

Query: “windows” • Microsoft Windows • When will Windows 7 be released? • What’s the Windows update URL? • I want to download Windows Live Essentials • House windows • Where can I buy replacement windows? • What brands are available? • Aluminum or vinyl? • Windows Restaurant, Las Vegas

Nuggets • Nugget = any binary property of a document • Provides address of a Pella dealer. • Discusses history of the Windows OS. • Is the Windows update page. • (factual, topical and navigational) • Problem: potentially thousands per query.

Evaluation • Model user information needs using nuggets. Different users will be interested in different combinations of nuggets. • Express judgments in terms of nuggets. Judgments may be automatic or manual. Judgments are binary: Does this document contain this nugget? • Nuggets link users and documents

Interdependencies Problem: Complex interdependencies between nuggets. Three possible simplifying assumptions: • User interested in nugget A will always be interested in nugget B. • User interested in nugget A will never be interested in nugget B. • Nuggets A and B are independent.

Possible Assumption #1 If a user interested in nugget A will always be interested in nugget B, then A and B can be treated as the same nugget.

Possible Assumption #2 A user interested in nugget A will never be interested in nugget B (and vice versa). A user’s interest in nugget A depends on their interest in nugget B. Nugget A and nugget be may be viewed as representing different interpretations of the query.

Query Interpretations • Assume M interpretations • Compute any effectiveness measure with respect to each interpretation (Sj) • Compute weighted average (where pjis probability of interpretation j) • Agrawal et al, 2009

Possible Assumption #3 A user’s interest in nugget A is independent of their interest in nugget B. The probability that the user is interested in nugget A is a constant (pA). The probability that the user is interested in nugget B is a constant (pB).

Relevance framework A document is relevant if it contains any relevant information (with N nuggets).

Relevance • Assume constant user probabilities • Assume constant document probabilities • J(d, i) = 1 iff document d is judged to contain nugget i count the nuggets

Probability of Relevance Estimated probability of relevance replaces relevance in standard evaluation measures, including nDCG, MAP, and Rank-biased precision. Assumptions #2 and #3 can then be combined. Other estimation methods possible.

Research Issues (talk #1) • Identifying nuggets automatically • Clustering • Co-clicks • Query refinement • Automatic judging • Patterns • Classification • How many nuggets are enough? • Estimating probability of relevance

Conclusions (talk #1) • Evaluating diversity requires us tomodel and represent the diversity. • Nuggets represent one possible solution. • Simple user model; simple assumptions; simple judging.

Questions? Talk #1: Evaluating Diversity Charles Clarke School of Computer Science University of Waterloo

Intermission The TREC 2009 Web Track • traditional adhoc task • novelty and diversity task • ClueWeb09 dataset (one billion pages) • explore effectiveness measures • http://plg.uwaterloo.ca/~trecweb

Intermission: Free sample topic <topic number=0> <query> physical therapist </query> <description> The user requires information regarding the profession and the services it provides. </description> <subtopic number=1> What does a physical therapist do? </subtopic> <subtopic number=2> Where can I find a physical therapist? </subtopic> <subtopic number=3> How much does physical therapy cost per hour? </subtopic> …

Talk #2: Evaluating Novelty Charles Clarke School of Computer Science University of Waterloo

Novelty • Novelty depends on diversity. • Previous talk considered probability of relevance in isolation (e.g., for the top-ranked document). • In this talk we will examine how user context impacts the probability of relevance.

User context

Simplest context model • Ranked list • User scans result 1, 2, 3, 4, 5, … in order. • Novelty of result k considered in light of the first k-1 results.

Relevance framework

Relevance Assuming constant probabilities.

Beyond the ranked list

Research issues (talk #2) • Better user models • Prior browsing context, local context, etc. • Evaluating impact of result presentation methods • Better captions • Query suggestions • Instant answers (stock quotes, weather, product prices, definitions)

Conclusions (talk #2) • Modeling and representing diversity allows us to consider novelty. • User models should be simple enough to be tractable. • User models should be complex enough to be meaningful.

Questions? Talk #2: Evaluating Novelty Charles Clarke School of Computer Science University of Waterloo

Evaluating Novelty and Diversity