1 / 43

Olivier Bodenreider , M.D., Ph.D. Thomas C. Rindflesch , Ph.D.

Olivier Bodenreider , M.D., Ph.D. Thomas C. Rindflesch , Ph.D. NCICB Operations meeting April 13, 2007. Advanced Library Services Developing a Biomedical Knowledge Repository to Support Advanced Information Management Applications. Context.

Download Presentation

Olivier Bodenreider , M.D., Ph.D. Thomas C. Rindflesch , Ph.D.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Olivier Bodenreider, M.D., Ph.D.Thomas C. Rindflesch, Ph.D. NCICB Operations meeting April 13, 2007 Advanced Library Services Developing a Biomedical Knowledge Repositoryto Support Advanced Information Management Applications

  2. Context • Provide biomedical informationto health care professionals and consumers • Exploit NLM resources • Maintain NLM’s cutting edge • Proposal overview • Advanced Library Services • Biomedical Knowledge Repository • Pilot projects

  3. Why additional services? • Biomedical information is growing at an increasingly faster pace • High-throughput approach to knowledge processing • Information retrieval is the starting point, not the end of the journey for the researcher • Towards “computable” knowledge • Integration between literature and other resources is insufficient • Adequate for navigation purposes • Insufficient for knowledge processing

  4. What additional services? • Refined information retrieval • Indexing on relations in addition to concepts • Find articles asserting that IL-13 inhibits COX-2 • Multi-document summarization • Extract and visualize facts from the literature • Summarize the top 300 papers on panic disorder • Question answering • Clinical and biological questions • What drugs interact with imipramine? • Knowledge discovery • Reasoning with facts from heterogeneous resources • From MEDLINE and UMLS together

  5. Normalized and integrated knowledge • Normalized knowledge • Common format • Common identification mechanism • Integrated knowledge • Single repository • Seamless environment • Phenotype and genotype information together Biomedical Knowledge Repository

  6. Sources of knowledge • Biomedical literature • Predications extracted from MEDLINE abstracts and full-text publicly available articles using text mining techniques • Other corpora (e.g., ClinicalTrials.gov) • Terminological knowledge • UMLS • Structured knowledge bases • NCBI resources (e.g., Entrez Gene) • Functional annotations from model organism databases • … • Contributed knowledge • The repository is open to collaborators outside NLM

  7. Facts Assertions Relations Semantic predications RDF triples relationship concept1 concept2 treats Imipramine Panic Disorder has_associated_disease APP Alzheimer disease Formalism Triples

  8. Annotated knowledge • Provenance information • Source (e.g., PMID) • Extraction mechanism • Timestamp • Frequency information • Redundancy • Collaborative annotation • “Was this information useful?” • Context of use/usefulness

  9. Semantic Web perspective • Common format for knowledge • Resource Description Format (RDF) • Common identification scheme • Unified Resource Identifier (URI) • Standard tools • RDF browsers • RDF “reasoners” • High level of interest for biomedicine in the SW community • Health Care and Life Sciences Interest Group

  10. Sourceselection (PubMed, annotations) Biomedical Literature MEDLINE CT.gov DocumentSummarization UMLS Terminological Knowledge QuestionAnswering Entrez Gene KnowledgeDiscovery StructuredKnowl. Bases GO InformationRetrieval ContributedKnowledge Advanced Library Services Summary BiomedicalKnowledgeRepository

  11. Sourceselection (PubMed, annotations) Source selection (PubMed) Biomedical Literature Biomedical Literature MEDLINE MEDLINE CT.gov CT.gov DocumentSummarization DocumentSummarization UMLS Terminological Knowledge QuestionAnswering BiomedicalKnowledgeRepository Entrez Gene Entrez Gene KnowledgeDiscovery StructuredKnowl. Bases StructuredKnowl. Bases GO InformationRetrieval ContributedKnowledge Advanced Library Services Pilot projects SemRep BiomedicalKnowledgeRepository XSLT Populating the repository Exploiting the repository

  12. Pilot #1 Populating and exploiting theBiomedical Knowledge Repository Converting Entrez Gene into RDF With Satya Sahoo (U. Georgia) and Kelly Zeng (LHC)

  13. Names has_name Overview XSLT Stylesheet XML (file) RDF (file) RDF (Oracle) JAPX Jena 124 element tags 2M genes 106 properties 410M triples

  14. APP (GeneID: 351) has_protein_name amyloid beta A4 protein …

  15. RDF triple Gene property eg:has_protein_reference_name_E APP (geneid-351) amyloid beta A4 protein subject predicate object

  16. Parkinson disease MAPT Pick disease PARK1 Parkinson disease Parkinson disease TBP Spinocerebellar ataxia MAPT Pick disease PARK1 Parkinson disease TBP Spinocerebellar ataxia RDF graph Connecting several genes MAPT Parkinson disease MAPT Pick disease PARK1 Parkinson disease TBP Parkinson disease TBP Spinocerebellar ataxia has_associated_disease

  17. Neurodegenerative diseases isa Alzheimer disease Parkinson disease APP Alzheimer disease has_associated_disease PARK1 Parkinson disease Future work • Transform additional resources into RDF • UMLS Metathesaurus • Other NCBI databases • Drug knowledge bases • … • Integrate resources • Query across resources

  18. glycosyltransferase GO:0016757 isa GO:0008194 GO:0016758 acetylglucosaminyl-transferase GO:0008375 has_molecular_function acetylglucosaminyl-transferase GO:0008375 EG:9215 LARGE Muscular dystrophy, congenital, type 1D MIM:608840 has_associated_phenotype From glycosyltransferaseto congenital muscular dystrophy

  19. Pilot #2 Populating and exploiting theBiomedical Knowledge Repository Semantic Medline:Multi-document summarizationand visualization With Marcelo Fiszman, M.D., Ph.D.and Halil Kilicoglu, M.S.

  20. Sourceselection (PubMed, annotations) Source selection (PubMed) Biomedical Literature Biomedical Literature MEDLINE MEDLINE CT.gov CT.gov DocumentSummarization DocumentSummarization UMLS Terminological Knowledge QuestionAnswering BiomedicalKnowledgeRepository Entrez Gene KnowledgeDiscovery StructuredKnowl. Bases GO InformationRetrieval ContributedKnowledge Advanced Library Services Pilot projects SemRep BiomedicalKnowledgeRepository Populating the repository Exploiting the repository

  21. retrieval summarization 500 citations Network of relations Information retrieval Semantic Medline Managing retrieval results breast cancer

  22. Managing retrieval results breast cancer

  23. Guiding principles • Visualization • Overview first • Details on demand • Integration of knowledge content • Automated management of knowledge from text • Seamless application interfaces [Shneiderman 1996] [BoSC, April, 2006]

  24. Seamless integration of technologies • Information retrieval • PubMed - MEDLINE • Essie - ClinicalTrials.gov • Natural language processing: SemRep • Represent content of text with semantic predications • Abstraction summarization • Informative: Overview of most salient information • Visualization • Indicative: Links to source text and additional information

  25. Semantic Medline Overview Text SemanticPredications SalientSemanticPredications InformativeGraph PubMedEssie SemRep Summarize Visualize Query MEDLINEClinicalTrials.gov UMLS StructuredBiomedicalData

  26. Text “breast cancer” PubMed Query MEDLINEClinicalTrials.gov Document selection SemanticPredications SalientSemanticPredications InformativeGraph SemRep Summarize Visualize UMLS StructuredBiomedicalData

  27. … aromatase inhibitor provides mortality benefit in early breast carcinoma… …determined the spectrum and frequency of ATM missense variants in 443 breast cancer patients… MEDLINE citations Text SemanticPredications SalientSemanticPredications InformativeGraph PubMedEssie SemRep Summarize Visualize Query MEDLINEClinicalTrials.gov UMLS StructuredBiomedicalData

  28. Text SemanticPredications SemRep UMLS Semantic intepretation SalientSemanticPredications InformativeGraph PubMedEssie Summarize Visualize Query MEDLINEClinicalTrials.gov StructuredBiomedicalData

  29. treats Aromatase Inhibitors Breast Carcinoma associated_with ATM gene Breast Carcinoma Semantic interpretation ... aromatase inhibitor provides mortality benefit in early breast carcinoma … … determined the spectrum and frequency of ATM missense variants in 443 breast cancer patients …

  30. treats Tamoxifen Breast Carcinoma treats Aromatase Inhibitors Breast Carcinoma associated_with ATM gene Breast Carcinoma … … … treats Tamoxifen Patients process_of Breast Carcinoma Individual Semantic predications Text SemanticPredications SalientSemanticPredications InformativeGraph PubMedEssie SemRep Summarize Visualize Query MEDLINEClinicalTrials.gov UMLS StructuredBiomedicalData

  31. SemanticPredications SalientSemanticPredications Summarize Summarization Text InformativeGraph PubMedEssie SemRep Visualize Query MEDLINEClinicalTrials.gov UMLS StructuredBiomedicalData

  32. Specify a topic Retain predications on the topic Eliminate uninformative predications Retain most frequent predications All Predications SalientPredications Reduction Abstraction summarization

  33. treats Tamoxifen Breast Carcinoma treats Aromatase Inhibitors Breast Carcinoma associated_with ATM gene Breast Carcinoma … … … treats Tamoxifen Patients process_of Breast Carcinoma Individual Salient semantic predications Text SemanticPredications SalientSemanticPredications InformativeGraph PubMedEssie SemRep Summarize Visualize Query MEDLINEClinicalTrials.gov UMLS StructuredBiomedicalData

  34. SalientSemanticPredications InformativeGraph Visualize Visualization Text SemanticPredications PubMedEssie SemRep Summarize Query MEDLINEClinicalTrials.gov UMLS StructuredBiomedicalData

  35. Text InformativeGraph Aromatase Inhibitors Tamoxifen treats treats Breast Carcinoma associated_with ATM gene UMLS StructuredBiomedicalData Informative graph SemanticPredications SalientSemanticPredications PubMedEssie SemRep Summarize Visualize Query MEDLINEClinicalTrials.gov

  36. Semantic Medline Live

  37. Related research Visualizing relations • Maps of linked concepts among document • Literature network of co-occurring genes • Associative concept space for discovery • Genomic information across structured and textual databases [Fuller et al. 2004] [Jensen et al. 2001] [van der Eijk et al. 2004] [Tao et al. 2005]

  38. Future work • Process all of MEDLINE/PubMed • With SemRep • Incrementally integrate structured knowledge sources • Entrez databases • UMLS • Genetics Home Reference • Implementation • Efficiency • Large amount of data

  39. Summary • Deliver health information • Biomedical Knowledge Repository • Advanced Library Services • Exploit • Current Library resources • Advanced information technology • Support timely translation • Of biomedical research • Into improvements in patient careand public health

  40. Acknowledgments • Caroline Ahlers • Mariana Dimitrov • Marcelo Fiszman • Halil Kilicoglu • François-Michel Lang • Lee Peters • Anna Ripple • Graciela Rosemblat • Satya Sahoo • Kelly Zeng

  41. References • Bodenreider O, Rindflesch TC. Advanced library services: Developing a biomedical knowledge repository to support advanced information management applications. Technical report. Bethesda, Maryland: Lister Hill National Center for Biomedical Communications, National Library of Medicine; September 14, 2006.http://lhncbc.nlm.nih.gov/lhc/docs/reports/2006/tr2006001.pdf

  42. AdvancedLibraryServices Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA Olivier Bodenreider olivier@nlm.nih.govThomas C. Rindflesch tcr@nlm.nih.gov

More Related