150 likes | 163 Views
Knowtator. A knowledge-based text annotation tool. Philip Ogren ( Philip.Ogren@uchsc.edu ) Larry Hunter, PhD ( Larry.Hunter@uchsc.edu ). Center for Computational Pharmacology University of Colorado Health Sciences Center Aurora, CO. Availability: bionlp.sourceforge.net/Knowtator
E N D
Knowtator A knowledge-based text annotation tool
Philip Ogren (Philip.Ogren@uchsc.edu) Larry Hunter, PhD (Larry.Hunter@uchsc.edu) Center for Computational Pharmacology University of Colorado Health Sciences Center Aurora, CO
Availability: bionlp.sourceforge.net/Knowtator Source code will be available under MPL soon. Comments and suggestions welcome! This work was supported by NIH grant R01-LM008111
Knowtator is: • A general-purpose text annotation tool • AProtégé plugin
Synopsis • Knowtator facilitates the manual creation of training and evaluation corpora for a variety of biomedical language processing tasks. • Knowtator’s key strength is the ability to define an annotation schema using a Protégé knowledge base.
Features • Stand-off annotation • Original text is not modified • Inter-annotator agreement metrics • Simple API allows annotation of any arbitrary text source. • Annotation filters • All annotations are assigned an annotator and (optionally) one or more annotation sets. • Annotations of many types, from multiple annotators and annotation sets can clutter the user interface. • Filters allow viewing select annotations
Knowtator annotation schemas are defined by a Protégé knowledge base Biological and linguistic concepts can be modeled in Protégé.
Entities in an annotation schema are defined by Protégé class definitions. Protégé slots and constraints on those slots can be used to relate annotations in meaningful ways. Class definition for endocytosis
Example: endocytosis annotation • Annotations of endocytosis relate to annotations of cellular component and molecule via the slot definitions of the endocytosis class definition. • Six slots of endocytosis • location: filled by cellular component annotations • origin: subslot of location • destination: subslot of location • transport participants: filled by molecule annotations • transported entities: subslot of transport participants • transporters: subslot of transport participants
Knowtator data model The goal of Knowtator is to create mappings between concepts represented in a knowledge base and texts that talk about those concepts.
The Knowtator data model has three parts: • Ontology/knowledge base of concepts and relationships (Protégé frames) • Mentions of concepts and assertions about relationships between concepts found in text • A mapping between the target text and members of 1 and 2 (annotations)
III. Annotations II. Mentions/Assertions I. Ontology/KB Endocytosis of molecule with thromboxane A2 receptor from endosome to cell surface
To do: • report on annotation efforts • mechanism for semi-automated annotation • import/export scripts for other annotation formats (e.g. ATLAS)