60 likes | 177 Views
Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD. David Farwell, Stephen Helmreich Computing Research Laboratory/New Mexico State University Lori Levin, Teruko Mitamura Language Technologies Institute/Carnegie Mellon University Bonnie Dorr, Rebecca Green
E N D
Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD • David Farwell, Stephen Helmreich Computing Research Laboratory/New Mexico State University • Lori Levin, Teruko Mitamura Language Technologies Institute/Carnegie Mellon University • Bonnie Dorr, Rebecca Green Institute for Advanced Computer Studies/University of Md. • Eduard Hovy Information Sciences Institute/University of S. California • Keith Miller, Florence Reeder MITRE Corporation • Owen Rambow, Nizar Habash Columbia University
Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD What we annotate • multiple comparable bilingual text corpora • parallel text corpora • multiple translations of texts • Genre - newspaper texts / DARPA corpus • Goals • common representation (interlingua) • common methodology and tools • observe and catalogue different surface realizations of the same meaning across and within languages
Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD Annotation Process • Text is syntactically parsed (Connexor / IL0) • Reviewed and corrected (TrEd) • Annotation to IL1 (Tiamat) • Content words annotated for sense (Omega) • Arguments annotated for thematic role (LCS) • 2 English translations of 6 articles • Arabic, French, Hindi, Japanese, Korean, Spanish • 12 annotators, 2 at each site • Total: 144 annotated texts to IL1 level
Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD Results: Agreement & Time • Tools (Tiamat) • Manuals (IL0 for 7 languages, IL1) • Inter-annotator agreement: kappa = .83 (mK), .66 (wn), .59 (theta-roles) • Annotation time: 4 hours/annotator/ text, 250 words/text, 2 annotators/text = approx. 2 person years for 100K at IL1 • Next step: merge IL1 representations and develop transformation algorithms to produce IL2