90 likes | 100 Views
Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen. Available resources. TUNA Corpus (Gatt et al; ca. 2500 refs) one-shot references balanced 2500 refs to furniture or people Robert's drawers (Viethen and Dale; ca. 140 refs) one-shot references
E N D
Robert's Drawers(and other variations on GRE shared tasks)Gatt, Belz, Reiter, Viethen
Available resources • TUNA Corpus (Gatt et al; ca. 2500 refs) • one-shot references • balanced • 2500 refs to furniture or people • Robert's drawers (Viethen and Dale; ca. 140 refs) • one-shot references • not yet balanced • GREC (“GRE in Context”) (Belz and Varges) • 2000 introductory passages from Wikipedia • 1000 annotated, rest in progress • annotated for reference to the main subject (“topic”) • different NP types:subjects, objects, possessives • COCONUT(Jordan) • goes beyond just identification • (possibly another corpus of newspaper texts)
Short-term additions to resources • Add comprehension data: • Carry out experiments to get people to identify referents and pair results with corpus descriptions. Data include: • reaction time • error rate • self-paced reading for GREC-type corpora
Long-term additions to resources • Eye-tracking data • Situated reference in virtual environments (Koller et al, this Workshop) • In progress: small multimodal corpus (Bangerter, van der Sluis, Gatt)
Task definition • Task structure: • provide a data source • have a small set of clearly defined tasks but ALSO: • have an open category • Evaluation: • default metric • call for proposals for evaluation metrics • correlate metrics with human judgments/performance • Scope for variation: • Task: content determination, realisation, lexical choice • Type of reference: full definite, anaphoric, singular/plural • Goal: model production or enhance comprehension
(Sub-)communities • GRE people (the usual suspects) • CoNLL/EMNLP community • Psycholinguists: • advice/expertise • computational psycholinguistic modelling
Aims • “Community” aims: • Have fun! • Get people working together, consolidate the community • Broaden the community • Broader aims: • Have a test-bed to see if NLG STECs actually work • GRE is probably the best initial candidate • Scientific aims: • Hothouse effect • Evaluation: • Use different methods • Evaluate the methods
Execution: Logistics • Dry run to pilot the idea • Possibly at UCNLG (September) • Shared competitive task: Content Determination • singular definites, furniture • Production evaluation, using TUNA • Include a call for evaluation metrics • Also include open track • Main event (larger scale & wider scope) • Co-located with INLG? • Several shared tasks + open category • Evaluation: • Production: match between algorithm & human • Comprehension: ease of identification, etc.
Evaluation: £££ • Sources of expense: • Human evaluations • Adding comprehension data to the corpora • Organisational costs (web site, etc) • Who's paying? • Community effort • Aberdeen platform grant • Brighton Prodigy project funds • No special funding (yet)