240 likes | 373 Views
Introduction to T he Semantic Web. Rick Bradshaw M.S. Sr. Data Architect Office of the Associate VP Health Sciences IT. Overview. Introduce the Semantic Web Interactive study of ClinicalTrials.gov semantic web style Take a closer look at RDF Run example SPARQL queries
E N D
Introduction to The Semantic Web Rick Bradshaw M.S. Sr. Data Architect Office of the Associate VP Health Sciences IT
Overview • Introduce the Semantic Web • Interactive study of ClinicalTrials.gov semantic web style • Take a closer look at RDF • Run example SPARQL queries • Introduce federation • Run example SPARQL queries against federated data
Semantic Web Definition • The Semantic Web facilitates applying machine-readable semantic data/metadata to resources that are distributed across the web/internet • Often associated with specific technologies • RDF – Resource Description Framework • RDFS – RDF Schema • OWL – Web Ontology Language • Web 3.0 (?) http://en.wikipedia.org/wiki/Semantic_Web
Machine-readable • A computer can read and “understand” data • Ask specific questions and get specific answers • Aggregate specific data, perform calculations, organize/order returned data • Can Google read and “understand” web data?
Example • Specific Question • How many Spinal Muscular Atrophy trials have been conducted at the University of Utah and when were they conducted? • Specific Answer = ? • Google’s Answer • “spinal muscular atrophy trial university of utah” • 14,500 pages • Top hit is very relevant in content • Is it “computable”?
HTML <h2>Enrolling/Ongoing: </h2> <p>Clinical and Genetic Studies in Spinal Muscular Atrophy</p> <p>Metabolic Dysfunction in SMA: impact of nutritional management</p> <p>Prospective Study of Bone Abnormalities in SMA</p> <p>STOP SMA: Phenylbutyrate trial in pre-symptomatic infants with SMA</p> <p> <span> <span>Pilot newborn screening project for identification and prospective followupof infants with spinal muscular atrophy</span> </span> </p> <p> <span> <span>Atalauren extension study in patients with Duchenne Muscular Dystrophy</span> </span> …
ClinicalTrials.gov RDF/XML • Semantic Web Data for Clinical Trials • (1) http://static.linkedct.org/ • (2) http://static.linkedct.org/page/trials/NCT00661453
Triples Triples Triples • Triple Statement – <s><p><o> • Subject (s) – the resource • Predicate (p) – the relationship • Often called the “property” in OWL • Object (o) – object of the relationship • Example • (s) trial:NCT00661453 • (p) linkedct:brief_title • (o) “CARNIVAL Type I: Valproic Acid and Carnitine in Infants With SMA Type I ”
Abbreviations • For ease of readability • trial:NCT00661453 • “trial:” - abbreviation for namespace “http://static.linkedct.org/resource/trials/” • “linkedct:” - abbreviation for namespace “http://static.linkedct.org/resource/linkedct/”
Triple Notations • There are many • Turtle • RDF • OWL • OBO
Triples Text Subject trial:NCT00661453 trial:NCT00661453 trial:NCT00661453 trial:NCT00661453 cond:1237 cond:1237 Predicate rdf:type ct:brief_title ct:start_date ct:condition rdf:type ct:condition_name Object ct:trials “CARNIVAL…” “April 2008” cond:12347 ct:condition “Spinal Muscular…”
Triple Graph trial:NCT00661453 rdf:type ct:trial ct:condition ct:brief_title “CARNIVAL Type I: Valproic Acid and Carnitine in Infants With Spinal Muscular Atrophy (SMA) Type I ” cond:12347 ct:start_date “April 2008” ct:condition_name “Spinal Muscular Atrophy Type I ” rdf:type ct:condition
RDF XML • (see file under #2) <rdf:RDF…> <rdf:Descriptionrdf:about="http://static.linkedct.org/resource/trials/NCT00481013"> <linkedct:brief_title>Valproic Acid in Ambulant Adults With Spinal Muscular Atrophy</linkedct:brief_title> … </rdf:RDF>
Observations • RDF is a standard supporting consistent data representation • Rules about standards apply • Use an existing standards whenever possible
Popular RDF Standards • Friend of a friend • alias=foaf • describe people and links • Dublin Core • alias=dc • “metadata” standard • Simple Knowledge Organization System • alias=skos • terminology, thesauri, …
Data Federation • Combine data from more than one data source • Heterogeneous data • All data sources do not use the same standards • ds1.firstName • ds2.first_name • ds3.person_name • Homogeneous data • All data sources use the same standards • ds1.firstName • ds2.firstName • ds3.firstName
Property Alignment Assertions • ds1:firstName owl:equivalentProperty foaf:firstName • ds2:first_name owl:equivalentProperty foaf:firstName
Class Alignment Assertions • ds1:Person owl:equivalentClass foaf:Person • ds2:HumanBeing owl:equivalentClass foaf:Person
Rule-based Assertions • Use rules to evaluate complicated “if-then” scenarios and assert results • SWRL – Semantic Web Rule Language • JRL - Jena Rule Language
Reasoning • Compute assertions • Adds new triple statements to the triple graph • Implications • Data of interest must be read from all data sources to compute assertions • When data sources are large this can take a long time and adequate computational resources are required
Use Case • Combine clinical trial data with patient data • SMA trial data from clinicaltrials.gov (linkedct.org) with patient demographics for 5 different trials
Resources • W3 Schools • http://www.w3schools.com/semweb/default.asp • W3C Web Sites • http://www.w3.org/standards/semanticweb/ • http://www.w3.org/RDF/ • http://www.w3.org/standards/techs/owl#w3c_all • Safari Books • http://proquest.safaribooksonline.com • Semantic Web Programming • Semantic Web for the Working Ontologist
Resources • Jena Java API • Protégé • D2R
Entity Relationship Diagram TRIAL TRIAL_ID BRIEF_TITLE CONDITION_ID START_DATE CONDITION CONDITION_ID CONDITION_NAME