Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen

Robert's Drawers(and other variations on GRE shared tasks)Gatt, Belz, Reiter, Viethen

Available resources • TUNA Corpus (Gatt et al; ca. 2500 refs) • one-shot references • balanced • 2500 refs to furniture or people • Robert's drawers (Viethen and Dale; ca. 140 refs) • one-shot references • not yet balanced • GREC (“GRE in Context”) (Belz and Varges) • 2000 introductory passages from Wikipedia • 1000 annotated, rest in progress • annotated for reference to the main subject (“topic”) • different NP types:subjects, objects, possessives • COCONUT(Jordan) • goes beyond just identification • (possibly another corpus of newspaper texts)

Short-term additions to resources • Add comprehension data: • Carry out experiments to get people to identify referents and pair results with corpus descriptions. Data include: • reaction time • error rate • self-paced reading for GREC-type corpora

Long-term additions to resources • Eye-tracking data • Situated reference in virtual environments (Koller et al, this Workshop) • In progress: small multimodal corpus (Bangerter, van der Sluis, Gatt)

Task definition • Task structure: • provide a data source • have a small set of clearly defined tasks but ALSO: • have an open category • Evaluation: • default metric • call for proposals for evaluation metrics • correlate metrics with human judgments/performance • Scope for variation: • Task: content determination, realisation, lexical choice • Type of reference: full definite, anaphoric, singular/plural • Goal: model production or enhance comprehension

(Sub-)communities • GRE people (the usual suspects) • CoNLL/EMNLP community • Psycholinguists: • advice/expertise • computational psycholinguistic modelling

Aims • “Community” aims: • Have fun! • Get people working together, consolidate the community • Broaden the community • Broader aims: • Have a test-bed to see if NLG STECs actually work • GRE is probably the best initial candidate • Scientific aims: • Hothouse effect • Evaluation: • Use different methods • Evaluate the methods

Execution: Logistics • Dry run to pilot the idea • Possibly at UCNLG (September) • Shared competitive task: Content Determination • singular definites, furniture • Production evaluation, using TUNA • Include a call for evaluation metrics • Also include open track • Main event (larger scale & wider scope) • Co-located with INLG? • Several shared tasks + open category • Evaluation: • Production: match between algorithm & human • Comprehension: ease of identification, etc.

Evaluation: £££ • Sources of expense: • Human evaluations • Adding comprehension data to the corpora • Organisational costs (web site, etc) • Who's paying? • Community effort • Aberdeen platform grant • Brighton Prodigy project funds • No special funding (yet)

Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen

Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen

Presentation Transcript

Graduate School

Team Dynamics

The Revised GRE Test

Multiple Intelligences and other individual variations

and other manual tasks

An On-line Approach to Reduce Delay Variations on Real-Time Operating Systems

The GRE Revised Test

Cache Coherence in Bus-Based Shared Memory Multiprocessors

The GRE

Index Fossils: Evolution and Biostratigraphy

Multiple Intelligences and other individual variations

Power PMAC Shared Memory December 2013

GRE Preparation Online

7 Proven Test Day Tips to Ace the GRE

All you need to know about the new GRE-CrackVerbal

What the Heck Is مشاوره روانشناسی نیک اندیش?

The GRE Revised Test

Graduate School

Student Survival Skills for the GRE

An On-line Approach to Reduce Delay Variations on Real-Time Operating Systems

Bone Inlay Chest of Drawers Elephanta Exports

GRE Test Guide - Things To Keep In Mind To Get Success