370 likes | 582 Views
Open Health Natural Language Processing Consortium (OHNLP). Mayo Clinic : Guergana Savova, Ph.D. James Masanz clinicalnlp@mayo.edu IBM Watson Research : Anni Coden, Ph.D. Michael Tanenblatt mednlp@us.ibm.com. Overview. OHNLP? Oh, NLP? Demo of a clinical OHNLP system (cTAKES)
E N D
Open Health Natural Language Processing Consortium(OHNLP) Mayo Clinic:Guergana Savova, Ph.D.James Masanz clinicalnlp@mayo.edu IBM Watson Research:Anni Coden, Ph.D.Michael Tanenblatt mednlp@us.ibm.com
Overview • OHNLP? Oh, NLP? • Demo of a clinical OHNLP system (cTAKES) • Demo of a medical OHNLP system (MedKAT) with extensions to pathology (/P) • How can I adapt the system to my data? • Lively discussion: how can I get involved, OHNLP future steps…
Open Health Natural Language Processing Consortium • www.ohnlp.org (part of caBIG Vocabulary Knowledge Center web presence) • Goal • Foster an open-source collaborative community around clinical NLP that can deliver best-of-breed annotators, leverage the dynamic features of UIMA flow-control, and establish the infrastructure for clinical NLP. • Two open source releases as part of OHNLP • Mayo’s pipeline for processing clinical notes (cTAKES) • IBM’s pipeline for processing medical notes (MedKAT) and pathology reports (MedKAT/P)
Other non-OHNLP clinical NLP Systems • Proprietary • medLEE (Columbia University) • Topaz (University of Pittsburgh) • Vanderbilt University • caTIES (University of Pittsburgh) • MPLUS/Onyx (University of Utah) • VA Hospital system • Open Source • i2b2 HITEx (Health Information Text Extraction)
Clinical example:clinical Text Analysis and Knowledge Extraction System (cTAKES) Presenters: Guergana Savova James Masanz
Overview • cTAKES • Developed at Mayo Clinic • Goals: • Phenotype extraction • Generic – to be used for a variety of retrievals and use cases • Expandable – at the information model level and methods • Modular • Cutting edge technologies – best methods combining existing practices and novel research with rapid technology transfer • Best software practices (80M+ notes) • Commitment to both R and D in R&D
cTAKES: Components • Clinical narrative as a sublanguage • Core components • Sentence boundary detection (OpenNLP technology) • Tokenization (rule-based) • Morphologic normalization (NLM’s LVG) • POS tagging (OpenNLP technology) • Shallow parsing (OpenNLP technology) • Named Entity Recognition • Dictionary mapping (lookup algorithm) • Machine learning (MAWUI) • Negation and context identification (NegEx)
Output Example: Disorder Object • “No evidence of unstable angina.” • Disorder • Text: unstable angina • Associated code: SNOMED 4557003 • Named entity type: disease/disorder • Status: current • Negation: true
Methods • Preliminary results: • Savova, Guergana; Kipper-Schuler, Karin; Buntrock, James and Chute, Christopher. 2008. UIMA-based clinical information extraction system. LREC 2008: Towards enhanced interoperability for large HLT systems: UIMA for NLP. • Manuscript with detailed system description and evaluation under review (JAMIA)
Medical example:Medical Knowledge Analysis System MedKAT and MedKAT/P Presenters: Anni Coden Michael Tanenblatt
Overview • MedKAT and MedKAT/P • Developed at IBM • Goal: • Identification of concepts and their attributes based on a standard or proprietary terminology/ontology • /P adaptation to pathology reports – relation extraction • Modular, Generic, Expandable • Terminology, Conceptual Model • Easy adaptation to specific corpus and conventions • Integration into institutional system • Ongoing commitment to Research and Development
Core Components • Document structure • Syntactic tools (tokenization ... Shallow parsing) • Concept identification • Negation • Relationship extraction
Demos • Query by Model / Cancer • Detailed view of annotations in Document Analyzer • http://domino.research.ibm.com/comm/research_projects.nsf/pages/medicalinformatics.index.html
Adaptation Presenters: Anni Coden Michael Tanenblatt
Adaptation • Sentence breaks • Text case • Part of speech tags • Shallow parser • Dictionary lookup • Document structure
Sentence Breaks • Some solutions: • Use annotator to re-break sentences • Retrain tagger
Case/Part of Speech Tags • Some solutions: • Retrain tagger • Use UIMA annotator to create a “true case” view
Part of Speech Tags • Some solutions: • Retrain tagger • Use dictionary lookup to modify incorrect tags • Create rule-based annotator to modify incorrect tags
Dictionary Lookup • Dictionary entries can be added, changed, deleted • Dictionary entry attributes can be added, changed, deleted • Search parameters can be modified • Post processing filters • Tokenization of text and dictionary should be the same
Document Structure • Plain text or XML (e.g., CDA) • Processes specific document section types (e.g., diagnosis) • Detection of formatting (e.g. bullets) • Detection of relations between sections • Making implicit conventions explicit (e.g. meaning of title)
Discussion: Future of OHNLP.ORG • Provided seed annotators and tools • Goal: growing community • Annotators, tools • Methodologies • Gold standards • Common type system for plug-and-play • What are the hurdles?
MedKAT • Dictionary adaptation • Concept identification parameters • Document structure detection
cTAKES • Negation window • Lookup window • Dictionary modifications