1 / 14

Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments

Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments. SIGIR´09, July 2009. Summary. Motivation Overview Related Work Methodology Pilot Study Analysis and Findings Conclusions. Motivation.

brigit
Download Presentation

Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments SIGIR´09, July 2009

  2. Summary • Motivation • Overview • Related Work • Methodology • Pilot Study • Analysis and Findings • Conclusions

  3. Motivation • With the advent of the technology more and more interest and use has been given to digital files, like digital books, audio, and video. • These digital files present new challenges in the constructions of test collections, more specifically collecting relevance assessments to tune system performance. This is due to: • The length and cohesion of the digital item • Dispersion of topics within it • Proposal => Develop a method for the collective gathering of relevance assessments using a social game model to instigate participants’ engagement.

  4. Overview Document (TREC) <doc> <docno> WSJ88046-0090 </docno> <hl> AT&T Unveils Services to Upgrade Phone Networks Under Global Plan </hl> <author> Janet Guyon </author> <dateline> New York </dateline> <text> American Telephone & Telegraph Co. Introduced the first of a new.... </text> </doc> Topic (TREC) <top> <num> Number: 168 <title> Topic: Financing AMTRAK <desc> Description: A document will address the role of the Federal Government in financing the operation of the National Railroad Transportation Corporation (AMTRAK). <narr> Narrative: A relevant document must provide information on the government’s responsability to make AMTRAK an economically viable entity. It could also discuss.. </top> • Test collections consist of: • A corpus of documents • A set of search topics • And relevance assessments collected from human judges

  5. Overview • Test Collection Construction (in TREC): • A set of documents and a set of topics are given to the TREC participants • Each participant runs the topics against the documents using their retrieval system. • A ranked list of the top k documents per topic are return to TREC. • TREC forms pools (selects top k documents) from the participants’ submission, which are judged by the relevance assessors. • Each submission is then evaluated using the resulting relevance judgment, and the evaluation results are then returned to the participants.

  6. Related work • Gathering relevance judgments: • Single judge – usually the topic author assesses the relevance of documents to the given topic. • Multiple judges – assessments are collected from multiple judges and are typically converted to a single score per document. • In Web search judgments are collect from a representative sample of the user population. Also often user logs are mined for indicators of user satisfaction with the retrieved documents.

  7. Related work • In their approach, they extended the use of multiple assessors per topic by: • Facilitating the review and re-assessment of relevance judgments • Enabling the communication between judges • Providing an enrich collection of relevance labels that incoporate different user profiles and user needs. This also enables the preservation and promotion of diversity of opinions.

  8. Related Work

  9. Methodology • The Collective Relevance Assessment (CRA) method involves three phases: • Preparation of data and setting CRA objectives

  10. Methodology • Design of the game

  11. Methodology • Relevance Assessment System

  12. Pilot Study • Two rounds: First last 2 weeks, the second lasted 4 weekes • Data: • INEX 2008 Track (50,000 digitized books,17 million Scanned pages, 70 topic TREC style) • Participants: • 17 Participants • Collected Data • Highlithed document regions • Binary relevance level per page • Notes and comments • Relevance degree assigned to a book

  13. Analysis and Findings • Properties of the methodology: • Feasibility – engagement level comparable to the INEX 2003 • Completeness and Exhaustiveness – 17,6% max completeness level. • Semantic Unit and Cohesion – relevance information forms a minor theme of the book. Relevant content is disperse. • Browsing and Relevance Decision – assessors requerie contextual information to make a decision. • Influence of incentive structures • Exloring vs. Reviewing • Assessment Strategies • Quality of the collected Data: • Assessor agreement –the level of agreement is higher comparing with TREC and INEX. • Annotations

  14. Conclusions • The CRA method sucessfully expanded traditional methods and introduced new concepts for gathering relevant assessment. • Encourages personalized and diverse perspectives on the topics. • Promotes the collection of rich contextual data that can assist with interperting relevance assessments and their use for system optimization.

More Related