1 / 28

Using ontologies to make sense of unstructured medical data

Using ontologies to make sense of unstructured medical data. Nigam Shah, MBBS, PhD nigam@stanford.edu. NCBO: Key activities. We create and maintain a library of biomedical ontologies. We build tools and Web services to enable the use of ontologies and their derivatives.

pennie
Download Presentation

Using ontologies to make sense of unstructured medical data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu

  2. NCBO: Key activities • We create and maintain a library of biomedical ontologies. • We buildtools and Web services to enable the use of ontologies and their derivatives. • We collaborate with scientific communities that develop and use ontologies.

  3. Download • Traverse • Search • Comment Ontology Services Views • Create • Download • Upload Mapping Services http://rest.bioontology.org • Tree-view • Auto-complete • Graph-view Widgets Annotation Term recognition Fetch “data” annotated with a given term Data Access http://bioportal.bioontology.org

  4. Annotation service Process textual metadata to automatically tag text with as many ontology terms as possible. 90 million calls, ~700 GB of data

  5. Resource index Won 1st prize at the 2010 Semantic Web Challenge @ ISWC Pubmed Abstracts Adverse Events (AERS) GEO : Clinical Trials Drug Bank

  6. Creating Lexicons Sentence in Clinical Note – 1 : : : Sentence in Clinical Note – m Frequency counter Term – 1 : : : Term – n Syntactic types Frequency

  7. Annotation Analytics Analyzing tagged data for hypothesis generation in bioinformatics

  8. Generic GO based analysis routine Genome Study Set • Get annotations for each gene in a set • Count the occurrence of each annotation term in the study set • Count the occurrence of that term in some reference set (whole genome?) • P-value for how surprising their overlap is. Reference set

  9. Annotation Analytics Landscape SNOMED-CT ? NCIT ICD-9 MeSH : Drugs, Chemicals Cell Type Human Disease Gene Ontology Health Indicator Warehouse datasets Drug Sets Grant Sets Patient Sets Gene Sets Paper Sets

  10. Open questions • Can we use something other than the GO? • Lack of annotations—even today, roughly 20% of genes lack any GO annotation. • Annotation bias—annotation with certain ontology terms is not independent of each other. • Lack of a systematic mechanism to define a level of abstraction.

  11. Profiling a set of Aging genes 261 Age-related genes Genome Disease Ontology ~ 30% of genome

  12. Using ontologies other than GO ERCC6 nucleoplasm PARP1 protein N-terminus binding ERCC6 <disease term?> PARP1 <disease term?>

  13. Enrichment Analysis with the DO www.ncbi.nlm.nih.gov/pubmed/16107709 http://www.geneontology.org/GO.downloads.annotations.shtml {ERCC6, PARP1} PMID:16107709 {ERCC6, PARP1}  {Cockayne syndrome, DNA damage} NCBO Annotator: http://bioportal.bioontology.org ERCC6 GO:0005654 PMID:16107709 ERCC6 GO:0008094 PMID:16107709 PARP1 GO:0047485 PMID:16107709 ERCC6 GO:0005730 PMID:16107709 PARP1 GO:0003950 PMID:16107709

  14. Annotation Analytics on EMR data Analysis of tagged data from electronic health records

  15. Profiling patient sets ICD9 789.00 (Abdominal pain, unspecified site) 86k patient Reports Patient records processed from U. Pittsburgh NLP Repository with IRB approval.

  16. Annotation (Clinical Text)

  17. Generation of tagged data Text clinical note BioPortal – knowledge graph Creating clean lexicons Term – 1 : : : Term – n Frequency Diseases Annotation Workflow Term recognition tool NCBO Annotator NegEx Patterns Procedures Syntactic types Drugs Terms Recognized NegEx Rules – Negation detection Further Analysis Negation detection Cohort of Interest Terms form a temporal series of tags 

  18. Detecting the Vioxx Risk Signal Vioxx Patients (1,560) VioxxMI (339) MI Patients (1,827) ROR of 2.058, CI of [1.804, 2.349] The X2statistic has p-value < 10-7 ROR=1.524, CI=[0.872, 2.666] X2 p-value = 0.06816. RA Patients (14,079) p-value < 1.3x10-24

  19. Detecting Adverse Events

  20. Detecting Adverse Events

  21. Detecting Adverse Events

  22. Detecting Off-label use

  23. Annotation Analytics Landscape SNOMED-CT What questions can we ask? NCIT ICD-9 MeSH : Drugs, Chemicals Cell Type Aging EMRs Human Disease Gene Ontology Health Indicator Warehouse datasets Drug Sets Gene Sets Paper Sets Grant Sets Patient Sets

  24. Associations and outcomes Enrichment What questions can we ask? Off-label Indications Side effects

  25. Acknowledgements • Paea LePendu • Yi Liu • Srinivasan Iyer • Steve Racunas • Anna Bauer-Mehren • Clement Jonquet • Rong Xu • Mark Musen • NIH – NCBO funding • Mayo Team • Hongfang Liu • Stephen Wu • Sylvia Holland • Alex Skrenchuk

  26. Mining Annotations of Grants, Publications • Publications from Medline • Only “Journal articles” Grants from 1972 to 2007 30 funding agencies

  27. Sponsorship and Allocation

  28. Who funds what

More Related