1 / 42

Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES)

Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES). Guergana K. Savova, PhD Children’s Hospital Boston and Harvard Medical School. Acknowledgements. Software developers and contributors at different times (in no specific order)

aleron
Download Presentation

Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recent Efforts in Clinical NLP:Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and Harvard Medical School

  2. Acknowledgements Software developers and contributors at different times (in no specific order) • James Masanz, Mayo Clinic • Patrick Duffy, Mayo Clinic • Philip Ogren, University of Colorado • Sean Murphy, Mayo Clinic • Vinod Kaggal, Mayo Clinic • Jiaping Zheng, Childrens Hospital Boston • Pei Chen, Childrens Hospital Boston • Jihno Choi, University of Colorado Investigators (in no specific order) • Christopher Chute, MD, DrPH, Mayo Clinic • James Buntrock, MS, Mayo Clinic • Guergana Savova, PhD, Childrens Hospital Boston

  3. Overview • Background • Clinical Text Analysis and Knowledge Extraction System (cTAKES) • cTAKES for developers • Download and install of cTAKES • How to build the dictionary • cTAKES: graphical user interface

  4. Definitions • Information Extraction (IE) • Extracting existing facts from unstructured or loosely structured text into a structured form • Information Retrieval (IR) • Finding documents relevant to a user query • Named Entity Recognition (NER) • Discovery of groups of textual mentions that belong to certain semantic class • Natural Language Processing (NLP) • Computational methods for text processing based on linguistically sound principles • Clinical NLP – NLP for the clinical narrative • Biomedical NLP – NLP for the clinical narrative and biomedical literature

  5. Problem Space • Structured information • Relational databases • Easy to extract information from them • Semi-structured information • Loosely formatted XML, CSV tables • Not challenging to extract information • Unstructured information • Scholarly literature, clinical notes, research reports, webpages • Majority of information is unstructured!! • Real challenge to extract the information

  6. Overarching Goal • Open-source, general-purpose clinical NLP toolkit • Phenotype extraction from unstructured data • Library of modules • Cohesive with other initiatives • Cutting edge methodologies • Best software development practices • Our principles • Open source • Scalable and robust • Modular and expandable • Based on existing standards and conventions • Scalable, adaptable methodologies through open collaboration in the open-source development

  7. Processing Clinical Notes A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen,smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic. A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic. A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 mpresentation. Her initial blood glucose was 340 mg/dL. Glyburide A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide

  8. Clinical Element Modelhttp://intermountainhealthcare.org/cem/Pages/home.aspx Disorder CEM text: diabetes mellitus code: 73211009 subject: patient relative temporal context: 3 months ago negation indicator: not negated A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen,smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily.Her mother's brother was diabetic. A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation.Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen,smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily.Her mother's brother was diabetic. A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation.Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen,smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily.Her mother's brother was diabetic. A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation.Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily.Her mother's brother was diabetic. Medication CEM text: Glyburide code: 315989 subject: patient frequency: once daily negation indicator: not negated strength: 2.5 mg Tobacco Use CEM text: smoking code: 365981007 subject: patient relative temporal context: 25 years negation indicator: not negated Disorder CEM text: diabetes mellitus code: 73211009 subject: family member relative temporal context: negation indicator: not negated

  9. Comparative Effectiveness Disorder CEM text: diabetes mellitus code: 73211009 subject: patient relative temporal context: 3 months ago negation indicator: not negated Compare the effectiveness of different treatment strategies (e.g., modifying target levels for glucose, lipid, or blood pressure) in reducing cardiovascular complications in newly diagnosed adolescents and adults with type 2 diabetes. Compare the effectiveness of traditional behavioral interventions versus economic incentives in motivating behavior changes (e.g., weight loss, smoking cessation, avoiding alcohol and substance abuse) in children and adults. Medication CEM text: Glyburide code: 315989 subject: patient frequency: once daily negation indicator: not negated strength: 2.5 mg Tobacco Use CEM text: smoking code: 365981007 subject: patient relative temporal context: 25 years negation indicator: not negated Disorder CEM text: diabetes mellitus code: 73211009 subject: family member relative temporal context: negation indicator: not negated

  10. Meaningful Use Disorder CEM text: diabetes mellitus code: 73211009 subject: patient relative temporal context: 3 months ago negation indicator: not negated • Maintain problem list • Maintain active med list • Record smoking status • Provide clinical summaries for each office visit • Generate patient lists for specific conditions • Submit syndromic surveillance data Medication CEM text: Glyburide code: 315989 subject: patient frequency: once daily negation indicator: not negated strength: 2.5 mg Tobacco Use CEM text: smoking code: 365981007 subject: patient relative temporal context: 25 years negation indicator: not negated Disorder CEM text: diabetes mellitus code: 73211009 subject: family member relative temporal context: negation indicator: not negated

  11. Clinical Practice Disorder CEM text: diabetes mellitus code: 73211009 subject: patient relative temporal context: 3 months ago negation indicator: not negated • Provide problem list and meds from the visit Medication CEM text: Glyburide code: 315989 subject: patient frequency: once daily negation indicator: not negated strength: 2.5 mg

  12. Applications • Meaningful use of the EMR • Comparative effectiveness • Clinical investigation • Patient cohort identification • Phenotype extraction • Epidemiology • Clinical practice • and many more…. • With deep semantic processing, the sky is the limit for applications

  13. Partnerships • NCBC-funded initiatives • Integrating Data for Analysis, Anonymization and Sharing (iDASH) • Ontology Development and Information Extraction (ODIE) • Veterans Administration • Strategic Health Advanced Research Projects (SHARP) • SHARP 3: SMaRT app (http://www.smartplatforms.org/) • SHARP 4: www.sharpn.org • R01s • Shared annotated lexical resource • Temporal relation discovery for the clinical domain • Milti-source integrated platform for answering clinical questions • eMERGE, PGRN (Pharmacogenomics Research Network) • Linguistic Data Consortium and Penn Treebank • MITRE Corporation

  14. Integrating cTAKES within i2b2 ….a scalable informatics framework that will enable clinical researchers to use existing clinical data for discovery research and, when combined with IRB-approved genomic data, facilitate the design of targeted therapies for individual patients with diseases having genetic origins. • Querying encrypted clinical notes stored in the i2b2 database • Processing the result notes through cTAKES • Persisting extracted concepts into the i2b2 database • Thus, the concepts are now searchable by the researcher • Enabling the training and running classifiers directly from the i2b2 workbench https://www.i2b2.org/events/slides/i2b2_AMIA_Tutorial_20100310.pdf

  15. clinical Text Analysis and Knowledge Extraction System (cTAKES)

  16. cTAKES Adoption • May, 2011: • 2306 downloads* • eMERGE (SGH, NW) • PGRN (HMS, NW) • Extensions: Yale (YATEX), MITRE * Source: http://sourceforge.net/project/stats/?group_id=255545&ugn=ohnlp&type=&mode=alltime

  17. cTAKES Technical Details • Open source • Apache v2.0 license • http://sourceforge.net/projects/ohnlp/ • Java 1.5 • Dependency on UMLS which requires a UMLS license (free) • Framework • IBM’s Unstructured Information Management Architecture (UIMA) open source framework, Apache project • Methods • Natural Language Processing methods (NLP) • Based on standards and conventions to foster interoperability • Application • High-throughput system

  18. cTAKES: Components • Sentence boundary detection (OpenNLP technology) • Tokenization (rule-based) • Morphologic normalization (NLM’s LVG) • POS tagging (OpenNLP technology) • Shallow parsing (OpenNLP technology) • Named Entity Recognition • Dictionary mapping (lookup algorithm) • Machine learning (MAWUI) • types: diseases/disorders, signs/symptoms, anatomical sites, procedures, medications • Negation and context identification (NegEx) • Dependency parser • Drug Profile module • Smoking status classifier • CEM normalization module (soon to be released)

  19. Output Example: Drug Object • “Tamoxifen 20 mg po daily started on March 1, 2005.” • Drug • Text: Tamoxifen • Associated code: C0351245 • Strength: 20 mg • Start date: March 1, 2005 • End date: null • Dosage: 1.0 • Frequency: 1.0 • Frequency unit: daily • Duration: null • Route: Enteral Oral • Form: null • Status: current • Change Status: no change • Certainty: null

  20. Output Example: Disorder Object • “No evidence of cholangiocarcinoma.” • Disorder • Text: cholangiocarcinoma • Associated code: SNOMED 70179006 • Certainty: 1 • Context: current • Relatedness to patient: true • Status: negated

  21. cTAKES for developersDownload and install of cTAKES Building the dictionary Jiaping Zheng Children’s Hospital Boston

  22. Introduction See separate pdf for the slides

  23. Graphical User Interface (GUI) to cTAKES: a Prototype Pei J. Chen Children’s Hospital Boston

  24. cTAKES as a Service • Objectives • Demo cTAKES prototype web application • Empower End Users to leverage cTAKES • Gather feedback for future cTAKES GUI • Potential system integrations with other applications (i.e. i2b2, ARC, Web Annotator) • Developed within i2b2 to integrate cTAKES in the i2b2 NLP cell

  25. Live Demo cTAKES Web Application: a Prototype http://chipweb2.chip.org/cTakes_webservice_trunk/index.html

  26. Single clinical note

  27. Technologies Middleware Web Services • JAVA • Apache CXF • JSON Front-End Web GUI • ExtJS • JavaScript Back-End cTAKES • JAVA • UIMA

  28. Deployment Considerations Deployment Model Security Performance Licensing (UMLS, Apache, GPL v.3)

  29. Thoughts?

More Related