300 likes | 442 Views
Information Management for the Life Sciences M. Scott Marshall Marco Roos Adaptive Information Disclosure University of Amsterdam. A few loose ends. Practice: OWL modeling of a statement COI demo: Bridging CDISC and HL7 with query federation Terminology and SKOS Demonstration
E N D
Information Management for the Life SciencesM. Scott MarshallMarco RoosAdaptive Information DisclosureUniversity of Amsterdam
A few loose ends • Practice: OWL modeling of a statement • COI demo: Bridging CDISC and HL7 with query federation • Terminology and SKOS • Demonstration • Toward Query Federation – putting it all together
Towards RDF/OWL(1) ALL instances of PeptideHormone are an instance of Peptide that has_roleSOME instance of HormoneActivity Source: Alan Ruttenberg
Towards RDF/OWL(3) ALL instances of PeptideHormone are an instance of Peptide that has_roleSOME instance of HormoneActivity Source: Alan Ruttenberg
Towards RDF/OWL(3) - Instances Source: Alan Ruttenberg
Towards RDF/OWL(4) URIs chebi:25905 = <http://purl.org/obo/owl/CHEBI#CHEBI_25905> Source: Alan Ruttenberg
Towards OWL(5) : triples chebi:25905 rdfs:subClassOf chebi:16670. chebi:25905 rdfs:subClassOf _:1. :_1 owl:onProperty ro:hasRole. :_1 owl:someValuesFrom go:GO_00179. … Source: Alan Ruttenberg
SPARQLing: Put ?variables where you are looking for matches chebi:25905 rdfs:subClassOf chebi:16670. chebi:25905 rdfs:subClassOf _:1. :_1 owl:onProperty ro:hasRole. :_1 owl:someValuesFrom go:GO_00179. select ?moleculeClass where { ?moleculeClass rdfs:subClassOf chebi:16670. ?moleculeClass rdfs:subClassOf ?res. ?res owl:onProperty ro:hasRole. ?res owl:someValuesFrom go:GO_00179. } ?moleculeClass = chebi:25905 Source: Alan Ruttenberg
Current Task Forces • BioRDF – integrated neuroscience knowledge base • Kei Cheung (Yale University) • Clinical Observations Interoperability – patient recruitment in trials • Vipul Kashyap (Cigna Healthcare) • Linking Open Drug Data – aggregation of Web-based drug data • Chris Bizer (Free University Berlin) • Pharma Ontology – high level patient-centric ontology • Christi Denney (Eli Lilly) • Scientific Discourse – building communities through networking • Tim Clark (Harvard University) • Terminology – Semantic Web representation of existing resources • John Madden (Duke University)
Background of the HCLS IG • Originally chartered in 2005 • Chairs: Eric Neumann and Tonya Hongsermeier • Re-chartered in 2008 • Chairs: Scott Marshall and Susie Stephens • Team contact: Eric Prud’hommeaux • Broad industry participation • Over 100 members • Mailing list of over 600 • Background Information • http://www.w3.org/2001/sw/hcls/ • http://esw.w3.org/topic/HCLSIG
COI Task Force • Task Lead: Vipul Kashap • Participants: Eric Prud’hommeaux, Helen Chen, Jyotishman Pathak, Rachel Richesson, Holger Stenzhorn
COI: Bridging Bench to Bedside • How can existing Electronic Health Records (EHR) formats be reused for patient recruitment? • Quasi standard formats for clinical data: • HL7/RIM/DCM – healthcare delivery systems • CDISC/SDTM – clinical trial systems • How can we map across these formats? • Can we ask questions in one format when the data is represented in another format? Source: Holger Stenzhorn
COI: Use Case Pharmaceutical companies pay a lot to test drugs Pharmaceutical companies express protocol in CDISC -- precipitous gap – Hospitals exchange information in HL7/RIM Hospitals have relational databases Source: Eric Prud’hommeaux
Inclusion Criteria • Type 2 diabetes on diet and exercise therapy or • monotherapy with metformin, insulin • secretagogue, or alpha-glucosidase inhibitors, or • a low-dose combination of these at 50% • maximal dose. Dosing is stable for 8 weeks prior • to randomization. • … • ?patient takes meformin . Source: Holger Stenzhorn
Exclusion Criteria Use of warfarin (Coumadin), clopidogrel (Plavix) or other anticoagulants. … ?patient doesNotTake anticoagulant . Source: Holger Stenzhorn
Criteria in SPARQL ?medication1 sdtm:subject ?patient ;spl:activeIngredient ?ingredient1 . ?ingredient1 spl:classCode 6809 . #metformin OPTIONAL { ?medication2 sdtm:subject ?patient ; spl:activeIngredient ?ingredient2 .?ingredient2 spl:classCode 11289 . #anticoagulant } FILTER (!BOUND(?medication2)) Source: Holger Stenzhorn
Terminology Task Force • Task Lead: John Madden • Participants: Chimezie Ogbuji, M. Scott Marshall, Helen Chen, Holger Stenzhorn, Mary Kennedy, Xiashu Wang, Rob Frost, Jonathan Borden, Guoqian Jiang
Features: the “bridge” to meaning Concepts Features Data Ontology Literature Keyword Vectors Ontology Image(s) Image Features Gene Expression Profile Ontology Microarray Detected Features Ontology Sensor Array
Terminology: Overview • Goal is to identify use cases and methods for extracting Semantic Web representations from existing, standard medical record terminologies, e.g. UMLS • Methods should be reproducible and, to the extent possible, not lossy • Identify and document issues along the way related to identification schemes, expressiveness of the relevant languages • Initial effort will start with SNOMED-CT and UMLS Semantic Networks and focus on a particular sub-domain (e.g. pharmacological classification) Source: John Madden
Medical terminologies: today • Moderate number of large, evolved terminologies • Adapted for specific business-process contexts • Each separately, centrally curated • Typically hierarchical, various expressivities • Uncommon to mix vocabularies Outpatient billing - CPT Inpatient billing - CD Laboratory results - LOINC Clinical findings - SNOMED Journal indexing - MEDLARS Pharmacy - MEDRA Process - HL7 Clinical trials - CDISC Others... Source: John Madden
SKOS & the 80/20 principle: map “down” • Minimal assumptions about expressiveness of source terminology • No assumed formal semantics (no model theory) • Treat it as a knowledge “map” • Extract 80% of the utility without risk of falsifying intent 21 Source: John Madden Source: John Madden
The AIDA toolbox for knowledge extraction and knowledge management in a Virtual Laboratory for e-Science
Putting it all together • Choosing valid terms for use in the SPARQL query by browsing/searching the knowledge base. • Create single SPARQL endpoint for a federation of knowledge bases (SWObjects) • Apply bridging technique to bridge MeSH terms and terms in HCLS Knowledge Base. • Use terms from Terminology Server in Scientific Discourse
Task Force Resources to federate • BioRDF – knowledge base, aTags (stored in KB) • Clinical Observations Interoperability – drug ontology • Linking Open Drug Data – LOD data • Pharma Ontology – ontology • Scientific Discourse – SWAN ontology, SWAN SKOS, myexperiment ontology • Terminology – SNOMED-CT, MeSH, UMLS
Someday, we should be able to find this as evidence for a fact in a Knowledge Base
Getting Involved • Benefits to getting involved include: • Early access to use cases and best practice • Influence standard recommendations • Cost effective exploration of new technology through collaboration • Network with others working on the Semantic Web • Get involved Email chairs and team contact • team-hcls-chairs@w3.org • Participate in the next F2F (last one was here): • http://esw.w3.org/topic/HCLSIG/Meetings/2009-04-30_F2F
A Few Announcements • Still unofficial but almost set: Semantic Web Applications and Tools for the Life Sciences Workshop (SWAT4LS) in Amsterdam 2009 (tentative date: Nov 20) • Possibly W3C Semantic Web Health Care and Life Sciences Interest Group (HCLSIG) F2F in Fall in Amsterdam • Shared Names http://sharednames.org workshop likely in the Fall, location unknown • Protégé Conference in Amsterdam June 23 - 26