Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments

Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments SIGIR´09, July 2009

Summary • Motivation • Overview • Related Work • Methodology • Pilot Study • Analysis and Findings • Conclusions

Motivation • With the advent of the technology more and more interest and use has been given to digital files, like digital books, audio, and video. • These digital files present new challenges in the constructions of test collections, more specifically collecting relevance assessments to tune system performance. This is due to: • The length and cohesion of the digital item • Dispersion of topics within it • Proposal => Develop a method for the collective gathering of relevance assessments using a social game model to instigate participants’ engagement.

Overview Document (TREC) <doc> <docno> WSJ88046-0090 </docno> <hl> AT&T Unveils Services to Upgrade Phone Networks Under Global Plan </hl> <author> Janet Guyon </author> <dateline> New York </dateline> <text> American Telephone & Telegraph Co. Introduced the first of a new.... </text> </doc> Topic (TREC) <top> <num> Number: 168 <title> Topic: Financing AMTRAK <desc> Description: A document will address the role of the Federal Government in financing the operation of the National Railroad Transportation Corporation (AMTRAK). <narr> Narrative: A relevant document must provide information on the government’s responsability to make AMTRAK an economically viable entity. It could also discuss.. </top> • Test collections consist of: • A corpus of documents • A set of search topics • And relevance assessments collected from human judges

Overview • Test Collection Construction (in TREC): • A set of documents and a set of topics are given to the TREC participants • Each participant runs the topics against the documents using their retrieval system. • A ranked list of the top k documents per topic are return to TREC. • TREC forms pools (selects top k documents) from the participants’ submission, which are judged by the relevance assessors. • Each submission is then evaluated using the resulting relevance judgment, and the evaluation results are then returned to the participants.

Related work • Gathering relevance judgments: • Single judge – usually the topic author assesses the relevance of documents to the given topic. • Multiple judges – assessments are collected from multiple judges and are typically converted to a single score per document. • In Web search judgments are collect from a representative sample of the user population. Also often user logs are mined for indicators of user satisfaction with the retrieved documents.

Related work • In their approach, they extended the use of multiple assessors per topic by: • Facilitating the review and re-assessment of relevance judgments • Enabling the communication between judges • Providing an enrich collection of relevance labels that incoporate different user profiles and user needs. This also enables the preservation and promotion of diversity of opinions.

Related Work

Methodology • The Collective Relevance Assessment (CRA) method involves three phases: • Preparation of data and setting CRA objectives

Methodology • Design of the game

Methodology • Relevance Assessment System

Pilot Study • Two rounds: First last 2 weeks, the second lasted 4 weekes • Data: • INEX 2008 Track (50,000 digitized books,17 million Scanned pages, 70 topic TREC style) • Participants: • 17 Participants • Collected Data • Highlithed document regions • Binary relevance level per page • Notes and comments • Relevance degree assigned to a book

Analysis and Findings • Properties of the methodology: • Feasibility – engagement level comparable to the INEX 2003 • Completeness and Exhaustiveness – 17,6% max completeness level. • Semantic Unit and Cohesion – relevance information forms a minor theme of the book. Relevant content is disperse. • Browsing and Relevance Decision – assessors requerie contextual information to make a decision. • Influence of incentive structures • Exloring vs. Reviewing • Assessment Strategies • Quality of the collected Data: • Assessor agreement –the level of agreement is higher comparing with TREC and INEX. • Annotations

Conclusions • The CRA method sucessfully expanded traditional methods and introduced new concepts for gathering relevant assessment. • Encourages personalized and diverse perspectives on the topics. • Promotes the collection of rich contextual data that can assist with interperting relevance assessments and their use for system optimization.

Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments

Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments

Presentation Transcript

Information Gathering: Interactive Methods

Software Quality Control Methods

Information Gathering: Interactive Methods

Regulatory Requirements with Relevance for Quality of API

methods of control

Towards a collective impact initiative for children?

Gleaning Collective Data Gathering and Management Tools

Towards pricing for safety and quality

Quality Assessments

Quality Control in State Assessments

Making the most of Graduate School Quality and Relevance

Coordinated Assessments Data Gathering

RER8010 Quality Control Methods and Procedures for Radiation Technology

Improving the Quality of Online Tests and Assessments

The relevance of histories of methods expertise for studying socio-cultural change

Data Gathering Methods : Introduction

Methods of Control

Collective Bargaining and Educational Quality

Making the most of Graduate School Quality and Relevance

Methods of Control

Information Gathering: Interactive Methods