70 likes | 198 Views
CMU’s TTO3 Tasks. Analyze annotation manuals and annotated samples from all TTO3 teams Status: Formal UML models have been constructed and merged together into a single model for all of the annotated samples ToDo: Complete final review / revision of the model
E N D
CMU’s TTO3 Tasks • Analyze annotation manuals and annotated samples from all TTO3 teams • Status: Formal UML models have been constructed and merged together into a single model for all of the annotated samples • ToDo: Complete final review / revision of the model • Construct formal annotation type system • Status: use of UML for modeling allows direct mapping to UIMA type system description • ToDo: Complete development of UIMA type system for final annotation model
CMU’s TTO3 Tasks [2] • Propose end-to-end processing architecture for run-time annotation • Status: Architecture defined; some components have been prototyped • ToDo: Complete implementation of components, component descriptors, component wrappers • Implement automatic annotation • Status: Preliminary discussion at CMU and with Columbia regarding possible annotations • ToDo: Finalize target type system, annotate sample data, train annotation classifier using MinorThird • Evaluation of automatic annotation • ToDo: Measure the precision and recall of the trained annotator(s) on held out data, in order to assess progress vs. the TTO3 goal of high precision annotation
Upcoming Milestones • September 2008 • Complete final review / revision of the unified annotation model • Finalize target type system(s) for CMU/Columbia learning experiment • October 2008 • Complete UIMA type system for target type system(s) • Complete implementation of components, component descriptors, component wrappers • November 2008 • Annotate sample data • Train annotation classifier • Measure the performance of the trained annotator(s) on held out data
Research Highlights • Unified annotation model (UML) • Next slide: original v0.1; color-coding indicates the team that produced each annotation type • Following slide: CMU additions since May; color-coding indicates results of overlap analysis (recommended merge points) – white types are presumed OK • Interoperability Architecture • A third slide showing the current annotation architecture we intend to use for our automatic annotation experiment
Architecture for Training Automatic Annotators Multiple annotation modelsare trained, one per annotation The same generic wrapper isused for any trained model Minor 3rdModel File MinorThirdLearner Minor 3rdModel File XCASCASConsumer XCASCollectionReader Minor 3rdAnnotator Minor 3rdLabels File XCASDB Minor 3rdLabels File Database Manager trainingdata Annotated documents arestored in native UIMA formatfor fast (re-)annotation