80 likes | 89 Views
Explore the current state of the art in meeting metadata and annotations, their importance, examples, existing standards, limitations, and future needs for meeting archives.
E N D
Standards for representing meeting metadata and annotations in meeting databases Hervé Bourlard
What is the current state of the art in this area? a) What are metadata and annotations? What are some examples for meetings? • supplementary information added to an initial collection of data such as multimodal meeting recordings in order to enrich its informational content • annotations are rather time-related “semantic” information added to time-dependent data, and metadata is rather static information about a unit of recording such as a meeting
What is the current state of the art in this area? b) What are some examples of metadata/annotations for non-meeting resources? • MS Word can recognize (“annotate” or “tag”) various information types using Smart Tags • Google adds (using computation) some simple metadata to each page: title, language, dates • Open Document Format (OASIS Consortium) vs Microsoft’s Office Open XML • Open Archives Initiative: Dublin Core Metadata Set for library resources
What is the current state of the art in this area? c) Why are metadata/annotations important for meetings? What operations do they enable? • enhance the interactive experience of the participants • enhance the meeting search capabilities (e.g. keywords or named entities) • monitoring the evolution of a meeting: coaching, training or moderating • standards for representing metadata and annotations ensure interoperability between resources used for training and test, and between systems • plug-and-play resources can be shared, establishing common grounds for benchmarking
What is the current state of the art in this area? d) What should a standard for databases of meeting data/metadata/annotations contain? Why is it important to normalize these elements? • distinguish content of metadata and annotation elements from physical form; content: authorized vocabulary and its rules of application to the data; form: concrete representation, often based on XML tags • normalisation is crucial for both form (use annotated resources or metadata without having to write lengthy conversion scripts) and content, to ensure that the content of metadata or annotations remains compatible from one system to another
Enriched meeting records • Multiple layers/tiers • Hierarchical • Support for time aligned and general content
What are the existing standards for metadata/annotations of multimodal data, and what are their limitations? • Examples: Annotation Graphs, NITE XML toolkit (AMI Meeting Corpus), W3C Multimodal Interaction Activity (with EMMA, Extensible Multi-modal Annotation Language), the ISO TC37/SC4 Initiative, the Dublin Core Metadata Set • Limitations: not widely distributed, and do not apply directly to multimodal meeting data
What are the needs of meeting archives in the future? • Agree on metadata and annotations and set priorities • Agree on their representations – content (i.e. normalized vocabularies) and form (XML) • Prepare sample data, evaluate normalisation proposals, produce normative documents • Incorporate metadata/annotation formats into meeting assistance tools • Enhance robustness and coverage of automatic meeting processing tools