1 / 1

Introduction

Knowtator: A Protégé plug-in for annotated corpus construction. Philip V. Ogren. Division of Biomedical Informatics, Mayo Clinic College of Medicine, Rochester, Minnesota, USA. (Continued from previous column). Example.

Download Presentation

Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowtator: A Protégé plug-in for annotated corpus construction Philip V. Ogren Division of Biomedical Informatics, Mayo Clinic College of Medicine, Rochester, Minnesota, USA (Continued from previous column) Example The slots of the class definitions in the annotation schema define what properties an annotation can have. Figure 5 shows an example of a simple slot that holds the value of an identifier from a controlled vocabulary for an annotation of the class Process. Figure 6 shows an example of a complex slot that relates an annotation of type Problem Statement to an annotation of type Body Structure via the location slot. Introduction The following outlines an example of how Knowtator can be used to annotate problem statements, outcomes, and interventions that are found in clinical notes. The annotation schema shown in Knowtator is based on the International Classification for Nursing Practice (ICNP), a controlled vocabulary and data model created specifically for coding in this domain. Annotation Schema Creation: The Protégé knowledge-base editor can be used to create new class (Figure 1), instance, slot (Figures 2 and 3), and facet frames for defining the annotation schema. A general-purpose text annotation tool called Knowtator is presented. Knowtator facilitates the manual creation of annotated corpora that can be used for evaluating or training a variety of natural language processing systems. Building on the strengths of the widely used Protégé knowledge representation system, Knowtator has been developed as a Protégé plug-in that leverages Protégé’s knowledge representation capabilities to specify annotation schemas. Knowtator’s unique advantage over other annotation tools is the ease with which complex annotation schemas (e.g. schemas which have constrained relationships between annotation types) can be defined and incorporated into use. Figure 5 The annotation corresponding to the text ‘pain’ has slot that relates this annotation to a specific identifier in the ICNP terminology. A dialog that allows the entry of a string value for the identifier is shown. Synopsis Figure 6 An annotation corresponding to the class Problem Statement has been created. There is no span associated with the annotation. However, Problem Statement has several slots (shown in Figure 2) that correspond to other annotations in the text. The annotation for the span of text ‘parascapular thoracic’ with the class Body Structure becomes the value of the location slot of the Problem Statement annotation. Knowtator is a general-purpose text annotation tool. Figure 1 The creation of a subclass of Statement in progress using the Protégé class editor is shown. Knowtator is a Protégé plug-in. A key strength of Knowtator is its ability to relate annotations to each other via the slot definitions of the corresponding annotated classes. In the ICNP example above, the slot location of the class Problem Statement relates to the Body Structure annotation for the text extent ‘parascapular thoracic’. The constraints on the slot ensure that the relationships between annotations are consistent. Protégé is capable of representing much more sophisticated and complex conceptual models which can be used, in turn, by Knowtator for text annotation. Also, because Protégé is often used to create conceptual models of domains relating to biomedical disciplines, Knowtator is especially well suited for capturing named entities and their relations for those domains. Knowtator is open source and available at: bionlp.sourceforge.net/Knowtator Figure 3 The only slot of the class Artifact is a simple attribute that accepts a string value corresponding to an identifier for a term in the ICNP controlled vocabulary. Figure 2 The class definition for Problem Statement is shown with its slots and the constraints on those slots (e.g. an action of a Problem Statement must be of type Action). Features Acknowledgements Annotation of Text: Once an annotation schema has been created, then it can be immediately used for text annotation. Figure 4 shows some text that is going to be annotated. On the left is the subsumption hierarchy of the available annotation types. A single annotation has been created for the span of text ‘pain’ and is annotated to the class Process. • Merges annotations from multiple annotators • Performs a variety of inter-annotator agreement metrics • along with detailed error analysis data. • Consensus set creation mode for consolidating differences • between two or more annotators • Pluggable architecture for handling different text sources • Stand-off annotation (i.e. the annotated text is not • modified) • XML import/export • Scalable – can run on a standalone laptop or with a • database backend (or both) • Mozilla Public License version 1.1 • Filters provide fine-grained control over display, • annotation export, consensus set creation, and inter- • annotator agreement. • Display of annotations is highly configurable with respect • to the text shown and highlight color. • Larry Hunter, PhD1 • Zhiyong Lu1 • Kevin Cohen1 • Mike Bada1 • Andrew Dolbey1 • Christopher G. Chute, MD DrPH2 • Guergana Savova, PhD2 • Serguei Pakhomov, PhD2 • Marcelline R. Harris, PhD2 • University of Colorado Health Sciences Center, Aurora, CO. • Mayo Clinic College of Medicine, Rochester, MN. Figure 4 The text ‘pain’ was highlighted with a mouse, the class Process was selected and an annotation was created.

More Related