250 likes | 359 Views
Improving Discovery in Biology through Linked Data. Helena F. Deus. We live in a world of data. Data, data everywhere. Sequences. Microarrays. Electrophoresis. Chrystalography. In vitro experiments.
E N D
Improving Discovery in Biology through Linked Data Helena F. Deus
Data, data everywhere Sequences Microarrays Electrophoresis Chrystalography In vitro experiments sources: http://www.lbl.gov/publicinfo/newscenter/pr/2008/PBD-microarray.html; http://www.biologyreference.com/Dn-Ep/Electrophoresis.html; http://biology.kenyon.edu/courses/biol114/Chap08/Chapter_08a.html
Ingredients for Linked Data Use resource description framework (RDF) to create relationships between named things Discover new links by reusing ontologies and vocabularies • Name things and concepts using URI (Universal Resource Identifiers) label EGFR http://uniprot.org/EGFR genomicLocation sameAs http://geneontology.org/EGFR 7p12.1 westernBlot rdfs:subClassOf image rdf:type
Ingredients for Linked Data • SPARQL, the query language of the Web of Data :overExpressedIn http://uniprot.org/EGFR Alzheimer’s SPARQL ?Gene :overExpressedIn ?Disease ?Gene :hasFunction ?GOterm ?Pathway :hasParticipant ?Gene
Integrate Biological Data - the easy way NCBI Reactome epidermal growth factor receptor rea:Membrane nci:has_description rea:keyword CCCCGGCGCAGCGCGGCCGCAGCAGCCTCCGCCCCCCGCACGGTGTGAGCGCCCGACGCGGCCGAGGCGG … nih:sequence rea:Receptor nih:EGFR rea:EGFR rea:keyword nih:organism rea:keyword sameAs Homo sapiens nih:interacts rea:Transferase nih:organism nih:EGF
The Linked Data Cloud “Life sciences will drive adoption of the Semantic Web, just as high-energy physics drove the early Web.” - Sir Tim Berners-Lee, 2005
Building a Knowledge Continuum Knowledge Top-down approaches Formal Logical Models to be validated by reality Knowledge re-engineering bottleneck Linked Data Cloud Bottom-up approaches Knowledge Generation, data-driven Data
Biological Knowledge Continuum Metabolomics Knowledge Continuum Protein 3D structure Microarrays Proteomics Transcriptomics Genomics Electrophoresis Sequencing
Mapping genes to their functional roles Src: Science Jan 2010: Vol. 327 no. 5964 pp. 425-431
Querying the UCSC Genome Browser • Look up annotation for all genes with functions similar to protein P04637 select uniProt.gene.val, go.association.term_id, go.term.name from uniProt.gene, go.gene_product, go.association, go.term where uniProt.gene.acc ='P04637' and go.gene_product.symbol = uniProt.gene.val and go.gene_product.id = go.association.gene_product_id and go.association.term_id = go.term.id SQL uniprot:P04637 ?gene :product SPARQL go:term ?goterm Ack: Nigam Shah & Eric Prud’hommeaux
How about Experimental Results? ~20 000 genes ~100 interesting genes/proteins ~ 10 interesting pathways ~5 proteins testable in the lab Linked Data High-throughput technologies Literature Browse databases Computational statistics Hypothesis Generation “I like to call it low-input, high-throughput, no-output biology.”
From genes to discovery Drugbank ClinicalTrial OMIM MDM2 EGFR PTEN KIT PDGFRA NME4ARL6IP6 NOTCH1 unknown MTHFD2
Linking genes to diseases to drugs Sources: Marc Vidal; Albert-Laszlo Barabasi; Michael Cusick;Proceedings of the National Academy of Sciences
Linked data to follow MRSA spread UK MRSA Portugal MRSA
Can we model Systems Biology? Src: Nature Reviews 2010:11; 414-426 Ras CPLA2 RAF MEK ERK
Start using Linked Data NOW!! http://sindice.com HELENA.DEUS@DERI.ORG http://www.w3.org/wiki/HCLSIG/LODD/Data
Who are we talking to? • At NUIG: • Professor CathalSeoighe • Professor Frank Barry