120 likes | 349 Views
The Linked Clinical Data Project. Jyotishman Pathak , PhD Rick Kiefer. SemTIG November 4 , 2011. Purpose.
E N D
The Linked Clinical Data Project JyotishmanPathak, PhD Rick Kiefer SemTIG November 4, 2011
Purpose • The Linked Clinical Data (LCD) project aims to investigate emerging Semantic Web technologies for developing an ontology-driven framework for high-throughput phenotyping using Electronic Medical Records (EMRs) to analyze multi-factorial phenotypes. • Investigate ontology-based techniques. • Develop a framework for publishing and integrating. • Propose and validate semantic reasoning techniques to support rapid cohort identification
LCD Architecture Med Index NRAF MRIS MICS Health Quest MCLSS Endpoint Virtual Server MCLSS Databases Web Server Linked Open Drug Data Endpoints Linked Data API Thick Client Application Request Selector Thin Client Application Viewer Virtuoso SQL SPARQL Response Formatter RDF View Mobile Client Application
Project – Automated SNPedia • SNPedia contains a wealth of data but the information in the wiki is manually curated. The focus of this project is to automate the results using patient data. • Using MCLSS, identify patients with specific conditions. • Join with OMIM to determine the genetic locus associated with those conditions • Join with dbSNP to identify potentially associated SNPs. • Each of the joins will be done using a single federated SPARQL query. • Results will then be compared to data in SNPedia
Disease to SNP architecture RDF View Mapping SPARQL Query Patient Request Disease MCLSS SNOMED/ICD9 OMIM Gene Results SNP dbSNP Endpoints Databases
dbSNP/OMIM federated query PREFIX omim: <http://bio2rdf.org/omim_resource:> PREFIX dbsnp: <http://edison.mayo.edu:8890/schemas/dbsnp2/>SELECT DISTINCT ?rsID ?geneSymbol ?alleleName { SERVICE <http://omim.bio2rdf.org/sparql> { SELECT ?geneSymbol ?alleleName WHERE { ?alleleVariant rdf:type omim:AllelicVariant; <http://purl.org/dc/terms/title> ?alleleName; omim:symbol ?geneSymbol. FILTER(regex(str(?alleleName), "Diabetes", "i")). } } SERVICE <http://edison.mayo.edu:8890/sparql> { SELECT ?rsID WHERE { ?s dbsnp:symbol ?geneSymbol; dbsnp:rsid ?rsID. } } }
Process – Creating dbSNP endpoint • No endpoint could be found so one had to be created. • Download dbSNP database from a Sybase dump • Use Perl to filter the tables in order to isolate desired data and rewrite into tab delimited form. • Create tables in mySQL and import the files. • Use Virtuoso to link to the tables • Create RDF views by mapping the table columns to the desired endpoint subjects
Hurdles • Endpoints • Difficult to find • Unreliable up time • Unknown age of data • Schema documentation • Environment • Linux - could not find ODBC driver for Virtuoso • Virtuoso Bridge did not work with db2 • Virtual server – no admin permissions • Windows 2008 server – bug in webDAV access
Hurdles • Virtuoso • Did not support federated queries until March. • March release has bugs • Unable to run SPARQL queries against non-local endpoints • Federated queries of mixed location crashes the server • Beta fix release has performance issues • Documentation – outdated and poor navigation
Next steps • MCLSS • Identify small MCLSS views • Federated query with SIDER and RxNorm • Use TMO/etc for RDMS -> RDF mapping • dbSNP RDF view • Standardized RDMS -> RDF mapping • Visual graph for dbSNP/OMIM • SNPedia • Alter Bob’s Perl script to download data • Upload in mySQL for comparisions
Questions? Thank you! Bob Freimuth – Perl scripts to filter and transform the dbSNP database as well as invaluable sharing of genomic knowledge and advice. Website http://informatics.mayo.edu/LCD/index.php/Main_Page