210 likes | 368 Views
UIMA Introduction. SHARPn Summit June 11, 2012. Outline . UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations interactively. UIMA Terminology. CAS XCAS JCAS View Analysis Engine ( AE ) / Annotator
E N D
UIMA Introduction SHARPn Summit June 11, 2012
Outline • UIMA Terminology (not just TLAs) • Parts of a UIMA pipeline • Running a pipeline • Viewing annotations interactively
UIMA Terminology • CAS XCAS JCAS View • Analysis Engine (AE) / Annotator • XML output: XCAS XMI • Type System JCasGen • CAS Visual Debugger (CVD) • CPE(Collection Processing Engine)
UIMA • Framework • Defining data types • Passing data from one component to another • Tooling • Viewing results • Debugging • Editing XML visually
Data Through a Pipeline • Type System • Defines the data types passed along • CAS(Common Analysis Structure) • Container for the data passed along • Created by UIMA from the Type System
Parts of a UIMA Pipeline • Collection Reader • Read input document • Analysis Engine(s) / Annotator(s) • Process document • CAS Consumer • Output data
Tying a Pipeline Together • CPEdescriptor (Collection Processing Engine) • Collection Reader • Analysis Engine(s) • CAS Consumer • Aggregate analysis engine • Multiple Analysis Engines and their order
UIMA term Collection Reader Analysis Engine Analysis Engine Analysis Engine CAS Consumer Example Read files from a dir Sentence detector Tokenizer annotator Part of Speech tagger Output tokens to DB Pipeline Example
UIMA plugin for Eclipse • Provides visual editors for descriptors • Mini GUI for selecting options • Rather than editing XML directly • An “Update site” exists for installing plugin http://www.apache.org/dist/incubator/uima/eclipse-update-site
UIMA Tooling Options • Tools: • CPE Configurator • CVD (CAS Visual Debugger) • Options: • Command line scripts/.bat files • Run within Eclipse
Running a Pipeline - CPE • cTAKES provides a script and a bat file runctakesCPE • Choose aCPE descriptor, such as test_plaintext.xml from cTAKESdesc/cdpdesc/collection_processing_engine
Viewing Annotations - CVD • Viewing annotations using the CVD • Load the Type System • Load the XCAS or XMI
Annotation Viewers • UIMA tools • CVD (CAS Visual Debugger) • Annotation viewer • Viewing XML output • Any XML viewer • Any text editor
Questions? http://uima.apache.org/
Options to Run a Pipeline • CPE GUI • CVDGUI • Single Aggregate Analysis Engine • No Collection Reader • Instantiate a CpeDescription and invoke the process() method • uimaFIT– removes dependency on XML
Creating a New Annotator • Within Eclipse • Create Java project • Right click -> Add UIMA Nature • Add UIMA jars to .classpath (Build Path) • Create Analysis Engine (AE) descriptor • Add types to AE descriptor, or optionally create separate Type System descriptor • Write code!
Running an AE in CVD Using CVD to run an Analysis Engine • No Collection Reader • Single Analysis Engine (can be an aggregate) • No CAS Consumer • Load an Analysis Engine • Paste/type in text to process Family history of hyperlipidemia.
Modifying a parameter UIMA’s descriptor editors allow you to modify most parameters without looking at the XML itself.
Links • Getting started with UIMA http://uima.apache.org/doc-uima-annotator.html • UIMA Update site for use in Eclipse http://www.apache.org/dist/incubator/uima/eclipse-update-site
Email address masanz.james@mayo.edu