1 / 92

Eric Neumann Clinical Semantic Group W3C HCLS chair, MIT Fellow

Tutorial: Semantic Web Applications in Clinical Data Management. Eric Neumann Clinical Semantic Group W3C HCLS chair, MIT Fellow. Tutorial Overview. Bench-to-Bedside Vision Information Challenges Semantic Web : What is it? RDF: Recombinant Data (Aggregation)

thimba
Download Presentation

Eric Neumann Clinical Semantic Group W3C HCLS chair, MIT Fellow

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial: Semantic Web Applications in Clinical Data Management Eric NeumannClinical Semantic Group W3C HCLS chair, MIT Fellow

  2. Tutorial Overview • Bench-to-Bedside Vision • Information Challenges • Semantic Web: What is it? • RDF: Recombinant Data (Aggregation) • OWL: Vocabularies (NCI, SNOMED) • Rules • Translational Medicine Needs • Clinical Data Standards- CDISC • Re-Using Clinical Knowledge • Retrospective DBs: JANUS • Open Knowledge Benefits: Tox Commons

  3. Bench-to-Bedside • Connecting pre-clinical and clinical studies • Translational Medicine • Patient Stratification & Personalized medicine (not the same) • Knowledge and Data Integration • Better Disease Understanding • Next Generation Therapies, New Applications • More Predictive (earlier) Safety Signals

  4. from Innovation or Stagnation, FDA Report March 2004

  5. Tox/Efficacy ADME Optim New Regulatory Issues Confronting Pharmaceuticals from Innovation or Stagnation, FDA Report March 2004

  6. Translational Medicine • Enable physicians to more effectively translate relevant findings and hypotheses into therapies for human health • Support the blending of huge volumes of clinical research and phenotypic data with genomic research data • Apply that knowledge to patients and finally make individualized, preventative medicine a reality for diseases that have a genetic basis

  7. Drug Discovery & Development Knowledge Qualified Targets Molecular Mechanisms Lead Generation Toxicity & Safety Lead Optimization Pharmacogenomics Biomarkers Clinical Trials Launch

  8. Biomedical Research Clinical Practice Ecosystem: Goal State Merging Biomed Research, Clinical Trials and Clinical Practice

  9. HCChoices HCLS Ecosystem Insurers Grants HMO,PPO Biomed Research Publications and Public Databases BKB Large Studies Gov/Funding Risks & Benefits Disease Areas Drug R&D EHR Mol Path Res Clin Res Chem Manuf Drug Programs Clin POC Surveillance BiomarkerTox HCP Public Preclin Marketing VA System R&D Gov/Regulatory CROs Clin Safety JANUS SafetyCommons

  10. Information Challenges • No common way to bring data and docs together • HTML links carries no meaning with them • Today’s integration approaches prevent data re-use • No global way to annotate our experiments and experiences • Most annotations cannot be found by context • No “sci-blog” for data interpretation • Enterprise Information access and discoverability are weak • Making timely discoveries! • Why we all like Google • Cutting and pasting between docs promotes fact mutation and loss of provenance • Address business operations and tracking, and reduce static data copying

  11. A web of information Courtesy ofR. Stevens

  12. Distributed Nature of Biomedical Knowledge Patents Tox HCS Silos of Data… Biomarkers Targets Libraries Assays DrugRegistry Diseases Genotypes ClinicalTrials

  13. The Big Picture In Drug R&D Hard to understand from just a few isolated Points of View

  14. What if Scientists could put it together for themselves?

  15. Complete view tells a very different Story

  16. Clinical Papers Disease Subjects Genotype EnrollmentCriteria Dosing Observations Audit Trail Tox Signals Statistics Trials Ontology Whose Schema?

  17. Why Searching ala Google is not enough Google’s ability to rank and graph without using semantics is comparable to… … a Drug R&D Project that looks for associations, but makes no attempt to find or represent mechanisms of action

  18. What is the Semantic Web?

  19. The Layer Cake

  20. The Current Web • What the computer sees: “Dumb” links • No semantics - <a href> treated just like <bold> • Minimal machine-processable information

  21. The Semantic Web • Machine-processable semantic information • Semantic context published – making the data more informative to both humans and machines

  22. Needed to realize the SW vision • A standard way of identifying things • A standard way of describing things • A standard way of linking things • Standard vocabularies for talking about things

  23. The Semantic WebBasic Standards for Describing Things • Richer structure for basic resources (XML) • Describe Data by Semantics and Not Syntax: RDF • Define Semantics using RDFS or OWL • Reference and Relate All Resources using URIs • SPARQL is super model of SQL • Rules for higher level reasoning

  24. The Technologies: RDF • Resource Description Framework (RDF) • W3C standard for making statements or hypotheses about data and concepts • Descriptive statements are expressed as triples: (Subject, Verb, Object) Property Subject Object <Compound HB-2182> <binds_to> <Target P38_alpha>

  25. Facts as triples has_associated_disease PARK1 Parkinson disease subject predicate object

  26. Parkinson disease MAPT Pick disease PARK1 Parkinson disease Parkinson disease TBP Spinocerebellar ataxia MAPT Pick disease PARK1 Parkinson disease TBP Spinocerebellar ataxia From triples to a graph MAPT Parkinson disease MAPT Pick disease PARK1 Parkinson disease TBP Parkinson disease TBP Spinocerebellar ataxia has_associated_disease

  27. Neurodegenerative diseases isa Alzheimer disease Parkinson disease APP Alzheimer disease has_associated_disease PARK1 Parkinson disease Connecting graphs • Integrate graphs from multiple resources • Query across resources

  28. The URI - global identification URI serves as a universal and uniform identifier for all web based resources.

  29. A Family of Identifiers URI URL URN URI = Uniform Resource Identifier URL = Uniform Resource Locator URN = Uniform Resource Name LSID = Life Science Identifier LSID URI = Uniform Resource Identifier URL = Uniform Resource Locator URN = Uniform Resource Name LSID = Life Science Identifier http://www.w3.org/Addressing/

  30. Uniform Resource Locator • A type or resource identifier • Identifies the location of a resource (or part thereof) • Specifies a protocol to access the resource • http, ftp, mailto • E.g., • http://www.nlm.nih.gov/ URI URL URN LSID

  31. Uniform Resource Name • A type or resource identifier • Identifies the name of a resource • Location independent • Defines a namespace • E.g., • urn:isbn:0-262-02591-4 • urn:umls:C0001403 URI URL URN LSID

  32. DNS name unique ID namespace urn:lsid:ncbi.nlm.nih.gov:pubmed:12571434 Life Science Identifier • A type or resource identifier • A type of URN • For biological entities • Specific properties • Versioned • Resolvable • Immutable • E.g., URI URL URN LSID http://lsid.sourceforge.net/

  33. RDF Examples …as RDF-XML <cdisc:Subject http://clinic.com/study/T2271/subject/4183542663506> <nci:sex_code rdf:resource=“nci#Female” /> <cdisc:treatment rdf:resource=“http://clinic.com/study/T2271/subject/4183542663506/observation/O2241” /> <cdisc:vitalSigns rdf:resource=“http://clinic.com/study/T2271/subject/4183542663506/observation/O6561” /> <cdisc:adverseEvent rdf:resource=“http:// clinic.com/study/T2271/subject/4183542663506/observation/O6622” /> </cdisc:Subject> …as N3 <http://clinic.com/study/T2271/subject/4183542663506> a cdisc:Subject ; nci:sex_code nci:Female ; cdisc:treatment <http://clinic.com/study/T2271/subject/4183542663506/observation/O2241> ; cdisc:vitalSigns <http://clinic.com/study/T2271/subject/4183542663506/observation/O6561> ; cdisc:adverseEvent <http://clinic.com/study/T2271/subject/4183542663506/observation/O6622> .

  34. Semantic Data Integration: Incremental Roadmap • Data assets remain as they are!They do not need to be modified • The wrapper abstracts out details related to location, access and data structure • Integration happens at the information level • Highly configurable and incremental process • Ability to specify declarative rules and mappings for further hypothesis generation

  35. RDBM => RDF <hasDisease> <interactsWith> <canCause> <URI> <URI> {primary keys} {primary keys} <URI> <URI> <URI> <URI> Virtualized RDF

  36. Patient (id = URI1) “Mr. X” name has_structured_test_result related_to Patient (id = URI1) Person (id = URI2) MolecularDiagnosticTestResult (id = URI4) associated_relative has_family_history identifies_mutation indicates_disease problem MYH7 missense Ser532Pro (id = URI5) FamilyHistory (id = URI3) “Sudden Death” Dialated Cardiomyopathy (id = URI6) EMR Data LIMS Data evidence2 95% Semantic Data IntegrationBridging Clinical and Genomic Information “Paternal” 1 90% degree type evidence1 • Rule/Semantics-based Integration: • Match Nodes with same Ids • Create new links: IF a patient’s structured test result indicates a disease • THEN add a “suffers from link” to that disease

  37. 90% evidence Dialated Cardiomyopathy (id = URI6) “Paternal” suffers_from 1 “Mr. X” type degree name indicates_disease has_structured_test_result related_to Patient (id = URI1) Person (id = URI2) StructuredTestResult (id = URI4) identifies_mutation associated_relative has_family_history has_gene MYH7 missense Ser532Pro (id = URI5) problem FamilyHistory (id = URI3) “Sudden Death” Semantic Data Integration:Bridging Clinical and Genomic Information RDF Graphs provide a semantics-rich substrate for decision support. Can be exploited by SWRL Rules

  38. Topic: GSK3beta Topic Disease: DiabetesT2 Alt Dis: Alzheimers Target: GSK3beta Cmpd: SB44121 CE: DBP Team: GSK3 Team Person: John Related Set Path: WNT Drug Discovery Dashboard http://www.w3.org/2005/04/swls/BioDash Semantic Data Integration and Visualization:Drug Discovery

  39. Semantic Data Integration:Bridging Chemistry and Molecular Biology Semantic Lenses: Different Views of the same data BioPax Components Target Model urn:lsid:uniprot.org:uniprot:P49841 Apply Correspondence Rule:if ?target.xref.lsid == ?bpx:prot.xref.lsidthen ?target.correspondsTo.?bpx:prot

  40. Semantic Data IntegrationBridging Chemistry and Molecular Biology • Lenses can aggregate, accentuate, or even analyze new result sets • Behind the lens, the data can be persistently stored as RDF-OWL • Correspondence does not need to mean “same descriptive object”, but may mean objects with identical references

  41. Non-synonymous polymorphisms from db-SNP Semantic Data IntegrationPathway Polymorphisms • Merge directly onto pathway graph • Identify targets with lowest chance of genetic variance • Predict parts of pathways with highest functional variability • Map genetic influence to potential pathway elements • Select mechanisms of action that are minimally impacted by polymorphisms

  42. Scenario: Biomarker Qualification • Semantics which Define… • Biomarker Roles • Disease • Toxicity • Efficacy • Molecular and cytological markers • Tissue-specific • High content screening derived information • Different sets associated with different predictive tools • Statistical discrimination based on selected samples • Predictive power • Alternative cluster prediction algorithms • Support qualifications from multiple studies (comparisons) • Causal mechanisms • Pathways • Population variation

  43. Semantic Data Integration: Advantages • RDF: Graph based data model • More expressive than the tree based XML Schema Model • RDF: Reification • Same piece of information can be given different values of belief by different clinical genomic researchers • Potential for “Schema-less” Data Integration • Hypothesis driven approach to defining mapping rules • Can define mapping rules on the fly • Incremental approach for Data Integration • Ability to introduce new data sources into the mix incrementally at low cost • Use of Ontology to disallow meaningless mapping rules? • For e.g., mapping a gene to a protein…

  44. Semantic Data Integration“Schema-free” data integration • Low cost approach for data integration • No need for maintenance of costly schema mappings • Ability to “merge” RDF graphs based on simple declarative rules that specify: • Equality of URIs • Connecting nodes of same type • Connecting two nodes associated by a “path” • Disadvantage: Potential for specifying spurious non-sensical rules

  45. Semantic Data IntegrationUse of Reification • Level of accuracy of test result. • Sensitivity and Specificity of lab result • Level of confidence in genotyping or gene sequencing • Probabilistic relationships • Likelihood that a particular test result or condition is indicative of a disease or other medical condition • Level of trust in a resource • Results from a lab may be trusted more than result from another • Results from well known health sites (NLM) may be trusted more than others • Belief attribution • Scientific hypotheses may be attributed to appropriate researchers

  46. The Available Data Space Separate RDF documents are merged automatically into one aggregate graph.

  47. Recombination in Molecular Genetics works due to proper alignment of genetic regions, thereby preventing gene loss, mangling, or duplication.

  48. Recombinant Data Graphs can be filtered and pivoted, without losing meaning

  49. Recombinant Data • Mash-ups that don’t lose perspective • Dynamic mixing of data • Provide Different Views for Different Roles and Functions • Dashboards • Direct output of a SPARQL query

  50. Key Functionality offered by Semantic Web • Ubiquity • Same identifiers for anything from anywhere • Discoverability • Global search on any entity • Interoperability • => “Recombinant Data” is Application Independence

More Related