320 likes | 470 Views
two talks in one!. Evaluating Novelty and Diversity. Charles Clarke School of Computer Science University of Waterloo. Goals for Evaluation Measures. meaningful tractable reusable. Evaluation Framework. We examine a framework for evaluation.
E N D
two talks in one! Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo
Goals for Evaluation Measures • meaningful • tractable • reusable
Evaluation Framework We examine a framework for evaluation. Specific measures covered by the framework include: Clarke et al. (SIGIR ’08) Agrawal et al. (WSDM ’09) Clarke et al. (ICTIR ‘09)
Talk #1: Evaluating Diversity Charles Clarke School of Computer Science University of Waterloo
Query: “windows” • Microsoft Windows • When will Windows 7 be released? • What’s the Windows update URL? • I want to download Windows Live Essentials • House windows • Where can I buy replacement windows? • What brands are available? • Aluminum or vinyl? • Windows Restaurant, Las Vegas
Nuggets • Nugget = any binary property of a document • Provides address of a Pella dealer. • Discusses history of the Windows OS. • Is the Windows update page. • (factual, topical and navigational) • Problem: potentially thousands per query.
Evaluation • Model user information needs using nuggets. Different users will be interested in different combinations of nuggets. • Express judgments in terms of nuggets. Judgments may be automatic or manual. Judgments are binary: Does this document contain this nugget? • Nuggets link users and documents
Interdependencies Problem: Complex interdependencies between nuggets. Three possible simplifying assumptions: • User interested in nugget A will always be interested in nugget B. • User interested in nugget A will never be interested in nugget B. • Nuggets A and B are independent.
Possible Assumption #1 If a user interested in nugget A will always be interested in nugget B, then A and B can be treated as the same nugget.
Possible Assumption #2 A user interested in nugget A will never be interested in nugget B (and vice versa). A user’s interest in nugget A depends on their interest in nugget B. Nugget A and nugget be may be viewed as representing different interpretations of the query.
Query: “windows” • Microsoft Windows • When will Windows 7 be released? • What’s the Windows update URL? • I want to download Windows Live Essentials • House windows • Where can I buy replacement windows? • What brands are available? • Aluminum or vinyl? • Windows Restaurant, Las Vegas
Query Interpretations • Assume M interpretations • Compute any effectiveness measure with respect to each interpretation (Sj) • Compute weighted average (where pjis probability of interpretation j) • Agrawal et al, 2009
Possible Assumption #3 A user’s interest in nugget A is independent of their interest in nugget B. The probability that the user is interested in nugget A is a constant (pA). The probability that the user is interested in nugget B is a constant (pB).
Query: “windows” • Microsoft Windows • When will Windows 7 be released? • What’s the Windows update URL? • I want to download Windows Live Essentials • House windows • Where can I buy replacement windows? • What brands are available? • Aluminum or vinyl? • Windows Restaurant, Las Vegas
Relevance framework A document is relevant if it contains any relevant information (with N nuggets).
Relevance • Assume constant user probabilities • Assume constant document probabilities • J(d, i) = 1 iff document d is judged to contain nugget i count the nuggets
Probability of Relevance Estimated probability of relevance replaces relevance in standard evaluation measures, including nDCG, MAP, and Rank-biased precision. Assumptions #2 and #3 can then be combined. Other estimation methods possible.
Research Issues (talk #1) • Identifying nuggets automatically • Clustering • Co-clicks • Query refinement • Automatic judging • Patterns • Classification • How many nuggets are enough? • Estimating probability of relevance
Conclusions (talk #1) • Evaluating diversity requires us tomodel and represent the diversity. • Nuggets represent one possible solution. • Simple user model; simple assumptions; simple judging.
Questions? Talk #1: Evaluating Diversity Charles Clarke School of Computer Science University of Waterloo
Intermission The TREC 2009 Web Track • traditional adhoc task • novelty and diversity task • ClueWeb09 dataset (one billion pages) • explore effectiveness measures • http://plg.uwaterloo.ca/~trecweb
Intermission: Free sample topic <topic number=0> <query> physical therapist </query> <description> The user requires information regarding the profession and the services it provides. </description> <subtopic number=1> What does a physical therapist do? </subtopic> <subtopic number=2> Where can I find a physical therapist? </subtopic> <subtopic number=3> How much does physical therapy cost per hour? </subtopic> …
Talk #2: Evaluating Novelty Charles Clarke School of Computer Science University of Waterloo
Novelty • Novelty depends on diversity. • Previous talk considered probability of relevance in isolation (e.g., for the top-ranked document). • In this talk we will examine how user context impacts the probability of relevance.
Simplest context model • Ranked list • User scans result 1, 2, 3, 4, 5, … in order. • Novelty of result k considered in light of the first k-1 results.
Relevance Assuming constant probabilities.
Research issues (talk #2) • Better user models • Prior browsing context, local context, etc. • Evaluating impact of result presentation methods • Better captions • Query suggestions • Instant answers (stock quotes, weather, product prices, definitions)
Conclusions (talk #2) • Modeling and representing diversity allows us to consider novelty. • User models should be simple enough to be tractable. • User models should be complex enough to be meaningful.
Questions? Talk #2: Evaluating Novelty Charles Clarke School of Computer Science University of Waterloo