1 / 6

ClearTK: A Framework for Statistical Biomedical Natural Language Processing

ClearTK: A Framework for Statistical Biomedical Natural Language Processing. Philip Ogren Philipp Wetzler. Department of Computer Science University of Colorado at Boulder. Introduction. ClearTK is a software package that: f acilitates statistical biomedical natural language processing

keagan
Download Presentation

ClearTK: A Framework for Statistical Biomedical Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ClearTK: A Framework for Statistical Biomedical Natural Language Processing Philip Ogren Philipp Wetzler Department of Computer Science University of Colorado at Boulder

  2. Introduction • ClearTK is a software package that: • facilitates statistical biomedical natural language processing • is written for UIMA • Java • Provides extensible feature extraction library • Interfaces with popular machine learning libraries • Maximum Entropy (OpenNLP) • Support Vector Machines (LIBSVM) • Conditional Random Fields (Mallet) • Misc. –e.g. Naïve Bayes (Weka) • Available free for academic research (contact philip@ogren.info)

  3. UIMA 101 Common Analysis Structure (CAS) analysis engines text collection reader • ClearTK provides a way to create analysis engines that use statistical models for classifying text. • The structure of the CAS is defined by a type system determined by the development team. consumers

  4. Statistical Biomedical Natural Language Processing 101 • Frame NLP task as classification task – e.g. For named entity recognition classify tokens as one of “B”, “I”, or “O”. • Training • Manually annotate a bunch of data • Extract features from text * • Write out training data * • Train a model • Run time • Extract features from unseen text * • Classify features with trained model* • Create annotations * ClearTK facilitates these tasks The concentration of alpha 2-macroglobulin, alpha 1-antitrypsin, plasminogen, C3-complement, fibrinogen degradation products (FDP) and fibrinolytic activity... O O O B I B I B B B I I O O O

  5. ClearTK Analysis Engine UIMA CAS input annotations UIMA CAS output annotations find foci of analysis interpret result / create annotations extract features feature set classify training data

  6. ClearTK Summary • Provides a framework that simplifies feature extraction and interfacing with a wide variety of machine learning libraries. • Is not dependent on any specific type system • Provides sophisticated feature extractors. • Provides infrastructure supporting core library (i.e. collection readers, analysis engines, consumers, etc.) • Well documented and unit tested.

More Related