130 likes | 265 Views
Advancing translational research with the Semantic Web. Ruttenberg , Clark, Bug, Samwald , Bodenreider , Chen, Doherty, Forsberg, Gao , Kashyap , Kinoshita, Luciano, Marshall, Ogbuji , Rees, Stephens, Wong, Wu, Zaccagnini , Hongsermeier , Neumann, Herman, Cheung. Author Background.
E N D
Advancing translational research with the Semantic Web Ruttenberg, Clark, Bug, Samwald, Bodenreider, Chen, Doherty, Forsberg, Gao, Kashyap, Kinoshita, Luciano, Marshall, Ogbuji, Rees, Stephens, Wong, Wu, Zaccagnini, Hongsermeier, Neumann, Herman, Cheung
Author Background • Many authors with diverse backgrounds: • Senior scientist at pharmaceutical company (Ruttenberg) • Director of Bioinformatics at Harvard Univ. (Clark) • Neuroscientist and informaticist at Drexel Univ. (Bug) • Developer for Science Commons at MIT (Samwald) • Scientist at National Library of Medicine (Bodenreider) • Principal investigator at healthcare company (Chen) • Developer of ‘Semantic Wb’ iPhone app (Doherty) • Informatics scientist at pharmaceutical company (Forsberg) • Clinical instructor in neurology (Gao) • Senior Medical Informatician at healthcare company (Kashyap) • Alzheimer researcher (Kinoshita)
Author Background (Cont’d) • Lecturer on Genetics at Harvard Univ. (Luciano) • Co-chair of HCLSIG (we’ll get to that later) (Marshall) • Computer engineer at Case Western Univ. (Ogbuji) • Lead software engineer at pharmaceutical company (Rees) • Director of Bioinformatics at Johnson & Johnson (Stephens) • Alzheimer researcher (Wong) • Alzheimer researcher (Wu) • VP of healthcare technology company (Zaccagnini) • Principle Informatician at healthcare company (Hongsermeier) • Senior Director of scientific data management co. (Neumann) • Semantic Web Activity Lead at W3C (Herman) • Associate Prof. of Computer Science & Genetics (Cheung)
Definitions • Translational Research – the movement of discoveries in basic research (the Bench) to application at the clinical level (the Bedside) • Evidence-based medicine → solutions for public health problems • Information ecosystem – scientific literature, experimental data, summaries of knowledge of gene products, diseases, and compounds, and informal scientific discourse and commentary in a variety of forums • The Semantic Web – an extension of the current Web that enables navigation and meaningful use of digital resources by automatic processes • Based on common formats supporting aggregation and integration of data from diverse sources
Problems with the existing Information Ecosystem • Lack of uniformly structured data • Often not interoperable; data can’t be used in combination with data in other formats • “Data silos” concept – disconnected databases not used in conjunction with one another • Data synthesis is necessary to cure and prevent diseases • Example: Parkinson’s Disease, Huntington’s Disease, and Alzheimer’s Disease share clinical, neural, cellular, and molecular traits—but specialists in each disease are often unaware of important literature on the other diseases which could be related
Criteria for Solutions • Web-based • Queries across experimental data from different communities of interest • Practicality • Linking and discovery of data interpretations need to be supported • Tied to the scientific publication process and culture • Tied to both current and evolving processes
The Semantic Web • Triples – consist of subject, predicate, and object • Subject: human TP53 gene • Page describing the gene • Predicate: hasGeneProduct • Page describing the relationship between them with a human-readable definition • Object: human TP53 MRNA • Page describing the object • Uniform Resource Identifier (URI) – string of characters used to identify a name or a resource • Web URLs (Universal Resource Locators) are URIs • Networked computers are able to follow these, locate identical (related) data references, and integrate knowledge
Semantic Web Technologies • Uniform naming of elements of discourse by URIs • Resource Description Framework (RDF) – gives formal specification of the syntax and semantics of statements (triples) • Shared standards and technologies around these methods of organization • SPARQL – standard query language for retrieving RDF-format answers • Gleaning Resource Descriptions from Dialects of Language (GRDDL) – tool used to turn XML data into RDF • Shared practices in using these standards and technologies
Applications to Biomedical Research • Faster movement of innovation from research laboratory to clinic or hospital • Utilizes the World Wide Web, the most successful information dissemination and sharing apparatus in existence • URIs are global in scope – less effort will be spent developing services to map gene identifiers to information about the gene • Common schema languages simplify management and comprehension of complicated and rapidly-evolving relationships among data • The subject, predicate, object structure itself can guide users towards correct use of the data • Flexible, extendable, decentralized databases allow room for future expansion of scientific knowledge • Ability to extend others’ work and build upon their data • Ability to check for consistency, classify, and inference • Less errors in diagnosis/treatment (clinical) and data analysis (research)
How did you conceptualize the medical examples? • How does HCLSIG hope to address informaticians’ potential lack of specialized medical knowledge?
How did you conceptualize the database syntax? • Again, I’m no expert—does anyone care to elaborate about the parts of (SQL) syntax such as: • Will HCLSIG be able to utilize the resources of non-informatics medical practitioners?