Collaborative Annotation of the AMI Meeting Corpus

Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh

AMI Partners

NXT Major Development Sites

AMI's aim • aim: to develop technologies for browsing meetings and to assist people during meetings • interdisciplinary: signal processing, language engineering, theoretical linguistics, human-computer interfaces, organizational psychology, ...

Why annotation? • For basic scientific understanding - e.g., • How do people choose a next speaker? • What is the relationship between speech and gesture during deixis? • For machine learning • Hand-code e.g. statement vs. question • Identify features for each like word sequences and prosody • Use the data to fit a statistical classifier that codes new data automatically

AMI Meeting Rooms 4 close- and 2 wide-view cameras, 4 head-set and 8 array microphones, presentation screen capture, whiteboard capture, pen devices, plus extra site-dependent devices TNO Edinburgh IDIAP

IS1004d, 3:07 - 4:11

Corpus Overview • 100 hrs of well-recorded meetings • orthographically transcribed with word timings by forced alignment • ASR output • heavily annotated by hand for communicative behaviours • Creative Commons Share-Alike licensing, with demo DVD

Hand Annotations • transcription with word-level timings from forced alignment (100%) • timestamping against signal (10-30%) • head gestures; hand gestures for addressing and interactions with objects; location in room; gaze; emotion? • discourse structure (70%) • dialogue acts (some w/ addressing), named entities, topic segments, linked extractive and abstractive summaries

Costs in person-hrs/hr

Core Problems • How do we represent all of these kinds of annotation on the same base data, including both structural relationships and timing? • How do we allow for multiple (human and machine) annotations of the same property, so that we can compare them?

NITE XML Toolkit • Mature toolkit for handling annotations with temporal ordering and full structural relations • Data storage format designed to support distributed corpus development • Libraries for data handling, query, and writing graphical user interfaces • End user annotation tools for common tasks • Command line utilities for analysis, feature extraction • Open source

NXT corpus design • data model is multi-rooted tree with arbitrary graph structure over the top • each node has one set of children, multiple parents • annotations often naturally map to a tree • corpus design to decide where trees intersect • NXT can represent arbitrary graphs but the more the data has this character, the less useful the query language is

Stand-off XML extract from Bdb001.A.speech-quality.xml <speechquality nite:id="Bdb001.emphasis.16" type="emphasis"> <nite:child href="Bdb001.A.words.xml#id(Bdb001.w.1,342)..id(Bdb001.w.1,344)" /> </speechquality> extract from Bdb001.A.words.xml <w nite:id="Bdb001.w.1,342" starttime="356.39" endtime="" c="W">time</w> <w nite:id="Bdb001.w.1,343" starttime="" endtime="" c="HYPH">-</w> <w nite:id="Bdb001.w.1,344" starttime="" endtime="356.59" c="W">line</w>

Metadata file Like set of DTDs for the XML files plus: • connections between the files • list of "observations" (coded dialogues/group discussions/texts) • catalog for finding signals and data on disk

Simple example query ($w word)($r reference): ($w@POS = “NN”) && ($r ^ $w) Return list of 2-tuples of words and referring expressions where the word’s part of speech is NN and the word is in the referring expression.

General features of the language • Match variable by no type, single type, or disjunctive type • Attribute and content tests for existence, ordering, equality, match to regexp • The usual boolean combinators • Quantifiers forall and exists • Filtering by passing results to another query to create a result tree (not list)

Uses for queries • Exploring the data in a browser • Basic frequency counts • Verifying data quality • Indexing complexes for further use • Finding things for screen rendering in GUI

Only configuration needed to: • search/index data in NXT format • display data in a standardized (ugly) way • Set up annotation tools for some common tasks • dialogue act • named entity • time-stamped labelling

[named entity demo]

Programming tailored interfaces • development time is 1.5 days - 2 weeks depending on • how clear the spec is • complexity of the interface and whether our "transcription view" middleware fits • familiarity with Swing

Named entity coder

Summary • NXT provides infrastructure for collaborative annotation that • Is distributed • Provides structural relationships • Provides timing w.r.t signals • Works for large-scale projects • NXT’s best current demonstration is in the AMI Meeting Corpus

Collaborative Annotation of the AMI Meeting Corpus