140 likes | 290 Views
Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments. SIGIR´09, July 2009. Summary. Motivation Overview Related Work Methodology Pilot Study Analysis and Findings Conclusions. Motivation.
E N D
Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments SIGIR´09, July 2009
Summary • Motivation • Overview • Related Work • Methodology • Pilot Study • Analysis and Findings • Conclusions
Motivation • With the advent of the technology more and more interest and use has been given to digital files, like digital books, audio, and video. • These digital files present new challenges in the constructions of test collections, more specifically collecting relevance assessments to tune system performance. This is due to: • The length and cohesion of the digital item • Dispersion of topics within it • Proposal => Develop a method for the collective gathering of relevance assessments using a social game model to instigate participants’ engagement.
Overview Document (TREC) <doc> <docno> WSJ88046-0090 </docno> <hl> AT&T Unveils Services to Upgrade Phone Networks Under Global Plan </hl> <author> Janet Guyon </author> <dateline> New York </dateline> <text> American Telephone & Telegraph Co. Introduced the first of a new.... </text> </doc> Topic (TREC) <top> <num> Number: 168 <title> Topic: Financing AMTRAK <desc> Description: A document will address the role of the Federal Government in financing the operation of the National Railroad Transportation Corporation (AMTRAK). <narr> Narrative: A relevant document must provide information on the government’s responsability to make AMTRAK an economically viable entity. It could also discuss.. </top> • Test collections consist of: • A corpus of documents • A set of search topics • And relevance assessments collected from human judges
Overview • Test Collection Construction (in TREC): • A set of documents and a set of topics are given to the TREC participants • Each participant runs the topics against the documents using their retrieval system. • A ranked list of the top k documents per topic are return to TREC. • TREC forms pools (selects top k documents) from the participants’ submission, which are judged by the relevance assessors. • Each submission is then evaluated using the resulting relevance judgment, and the evaluation results are then returned to the participants.
Related work • Gathering relevance judgments: • Single judge – usually the topic author assesses the relevance of documents to the given topic. • Multiple judges – assessments are collected from multiple judges and are typically converted to a single score per document. • In Web search judgments are collect from a representative sample of the user population. Also often user logs are mined for indicators of user satisfaction with the retrieved documents.
Related work • In their approach, they extended the use of multiple assessors per topic by: • Facilitating the review and re-assessment of relevance judgments • Enabling the communication between judges • Providing an enrich collection of relevance labels that incoporate different user profiles and user needs. This also enables the preservation and promotion of diversity of opinions.
Methodology • The Collective Relevance Assessment (CRA) method involves three phases: • Preparation of data and setting CRA objectives
Methodology • Design of the game
Methodology • Relevance Assessment System
Pilot Study • Two rounds: First last 2 weeks, the second lasted 4 weekes • Data: • INEX 2008 Track (50,000 digitized books,17 million Scanned pages, 70 topic TREC style) • Participants: • 17 Participants • Collected Data • Highlithed document regions • Binary relevance level per page • Notes and comments • Relevance degree assigned to a book
Analysis and Findings • Properties of the methodology: • Feasibility – engagement level comparable to the INEX 2003 • Completeness and Exhaustiveness – 17,6% max completeness level. • Semantic Unit and Cohesion – relevance information forms a minor theme of the book. Relevant content is disperse. • Browsing and Relevance Decision – assessors requerie contextual information to make a decision. • Influence of incentive structures • Exloring vs. Reviewing • Assessment Strategies • Quality of the collected Data: • Assessor agreement –the level of agreement is higher comparing with TREC and INEX. • Annotations
Conclusions • The CRA method sucessfully expanded traditional methods and introduced new concepts for gathering relevant assessment. • Encourages personalized and diverse perspectives on the topics. • Promotes the collection of rich contextual data that can assist with interperting relevance assessments and their use for system optimization.