140 likes | 223 Views
ATLAS Demystified: A Practical Introduction. Christophe Laprun, Jonathan Fiscus , John Garofolo, Sylvain Pajot National Institute of Standards and Technology May 31, 2002 Annotation Frameworks and Tools, LREC 2002. Overview. ATLAS = Architecture and Tools for Linguistic Analysis Systems.
E N D
ATLAS Demystified:A Practical Introduction Christophe Laprun, Jonathan Fiscus, John Garofolo, Sylvain Pajot National Institute of Standards and Technology May 31, 2002 Annotation Frameworks and Tools, LREC 2002
Overview • ATLAS = Architecture and Tools for Linguistic Analysis Systems • Goal: To make == . • We need to examine more annotation tasks Universe Linguistic Annotation Universe ATLAS-describable Universe generated by the ATLAS ontology: “An annotation is the fundamental act of associating some content to a region in a signal ”
Brief History • Started with Bird and Liberman’s Annotation Graphs (AGs) • ATLAS working group formed to explore AG concept • LDC, MITRE and NIST • Introduced at LREC 2000 • Since LREC 2000: • LDC pursued Annotation Graph implementation • To satisfy immediate annotation needs for speech and text • Developed AGTK • Optimized for annotation of linear signals • NIST pursued generalized ATLAS model • 2001 - Multidimensional signals • 2002 - Type support, explicit support for hierarchies
Motivation for Generalization • A long-term solution was needed • Linguistic research is rapidly moving beyond linear signals • Multi-modal complex signals with varying dimensionality • NIST Meeting Room data includes speech, video, and whiteboard interaction • Automatic Content Extraction (ACE) program includes extraction from speech, text and image data. • Gesture annotation ideally involves 3-dimensional space over time • …
Additional Needs Addressed During Generalization • Type definition support • Define the content, structure and relationships between annotations • Dual use: provides corpus design definition to framework and users • Hierarchical dependencies abound • Sentences are composed of words which are composed of phones, co-reference, parse trees, etc. • AGs do not explicitly express dependencies • Ubiquitous annotation validation • Happens at every stage of data manipulation: creation, modification and filtering • Syntax checking is only the first step
What We Have Accomplished • The core ATLAS annotation ontology • Type definition infrastructure • Developer framework
The Core ATLAS Ontology(Simple Speech Use Case) Task: Annotate sentences which are composed of words Children Sentence Annot. Children Annotation Word Annot. She Content had Region Interval Region Anchor Offset Anchor Offset Anchor Signal audio
The Core ATLAS Ontology(Simple Gesture Use Case) Gesture Region Interval 3DSegment Forearm Annotation Frame Anchor Frame Anchor XYZ Anchor XYZ Anchor
Type Definition Infrastructure • Meta Annotation Infrastructure for ATLAS (MAIA) • Provides mechanism for the definition and use of annotations at the semantic level • Specifies content, structure and relationships between annotations • Sufficiently expressive for validation • Users declare their types via XML • no coding required • Framework generates and uses type constructs from the definition dynamically • Validation occurs automatically
Type Definition Excerpt <AnnotationType name='sentence'> <AllowedChildren containedType=‘word'> <DefinesRegionAs/> <DefinesContentAs/> </AllowedChildren> </AnnotationType> <AnnotationType name=‘word'> <RegionType ref='interval'/> <ContentType ref=‘wordContent'/> </AnnotationType> <ContentType name=‘wordContent'> <ParameterType ref='string' role='text'/> </ContentType> Sentence Annot. Children Word Annot. WordContent Interval Region
Developer Framework jATLAS: a Java implementation • Core suite of objects: • Implements ATLAS’ generic annotation ontology • Defines an Application Programming Interface (API) • Low-level services: • Data import/export, management utilities • Defines a Service Provider Interface (SPI) to allow advanced framework extensions • additional persistence forms • Automatic validation services via MAIA
ATLAS Status • Stable ontology • Basic typing services via MAIA • Developer framework: jATLAS in Beta version • Has dramatically reduced development times for NIST prototype applications • Persistence format: ATLAS Interchange Format (AIF) • ACE format import • AG format import partially supported • Active development • Public domain source code, freely available
ATLAS Future Work • MAIA extensions • Type inheritance • Increased structural validation • Content-based validation • Framework extensions • Currently developing a GUI component framework • Tool development • Annotation and evaluation tools at NIST • Collaboration with other sites • Contributed tools repository
More information? • http://www.nist.gov/speech/atlas • We welcome feedback, • comments and suggestions