1 / 6

Human-Assisted Machine Annotation

Human-Assisted Machine Annotation. Sergei Nirenburg, Marjorie McShane, Stephen Beale Institute for Language and Information Technologies University of Maryland Baltimore County. What is tagged? Text segmentation, punctuation, special characters Numbers, dates, named entities

avari
Download Presentation

Human-Assisted Machine Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human-Assisted Machine Annotation Sergei Nirenburg, Marjorie McShane, Stephen Beale Institute for Language and Information Technologies University of Maryland Baltimore County

  2. What is tagged? • Text segmentation, punctuation, special characters • Numbers, dates, named entities • Morphosyntactic features • Parts of speech • Syntactic dependencies (full syntactic parses produced) • Ontologically-grounded lexical semantics (subsumes word sense selection, all word types) • Extra-ontological (parametric) lexical semantics • Ontologically-grounded compositional semantics (semantic dependencies, using case roles and about 300 other ontological-semantic relations) • Time • Space • Aspect • Modality (including speaker attitudes) • Causality • Textual co-reference • Real-world reference • Rhetorical relations

  3. Preprocessing editor, syntax browser/editor and TMR browser/ editor windows inDEKADE: tools for interactive editing and post-editing

  4. Domains and genres: currently, general news, economics news, travel and • meetings, medical texts. New domains require resource augmentation • (enhancement of the ontology and the lexicon). • Amount done: project just starting, about 2,500 words annotated so far. • Speed (no resource augmentation): 100K words in one year with 2 annotators and 50% of a systems support person. • Speed (with resource augmentation): add one knowledge engineer • Speed (with resource and analyzer improvements): add one software engineer • If resource augmentation is undertaken, we estimate that the speed of annotation • will double in the second year and will further increase 50% in the third year. • Interannotator agreement: 100% because - if the amount and rate of work is as • above - each stage of annotation will be done by a single annotator. • The automatic component of annotation will always be consistent, which bodes • well for the overall consistency of annotation using the HAMA method.

  5. Possible Applications “Gold standard” TMRs, annotations produced by the HAMA method with OntoSem and DEKADE, can be used as a training corpus for machine learning research or as interlingual representations for MT purposes. But the OntoSem/DEKADE environment can be used in many more ways. TMRs constitute structured, ontologically-grounded knowledge directly usable by automatic reasoning systems. Production of “gold standard” TMRs using the HAMA method leads to the augmentation of the ontology and the lexicon, thus facilitatingperformance improvements in the automatic component of annotation work. This means that the TMRs and the automatic part of the process of their production promise to improve the quality of question answering, information extraction, summarization and other advanced NLP and AI applications.

More Related