1 / 12

Linked Data

Linked Data . PRISM Oct 2010. Data. Data is captured within systems e.g. Web pages Databases Documents But it’s difficult to make use of data Access is difficult So many ways, types Meaning is unknown Unless you chat to the dba, author

skyler-lane
Download Presentation

Linked Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linked Data PRISM Oct 2010

  2. Data • Data is captured within systems e.g. • Web pages • Databases • Documents • But it’s difficult to make use of data • Access is difficult • So many ways, types • Meaning is unknown • Unless you chat to the dba, author • Machines have difficulty understanding what the data represents • Integration is difficult • Warehouses can’t hold everything and relate everything to everything else.

  3. Data Integration Problem Involved in every project Creates overhead Many data dources Adhoc solutions Maintenance issues Individually project focused No direction Pulling ourselves apart

  4. What is Linked Data? • Linked data is webified data accessible via http: • It’s just the next evolution of the web • The Semantic Web • RDF is the fundamental data model • It uses URIs and so allows linking of things and concepts • It allows separate systems designed independently to be later joined at the edges • It allows interoperability to be added where cost-effective • It allows any data to be expressed in a mixture of vocabularies. • Lowers the barrier for data access • Dramatically increases data usability • Allows machines to extract and interpret the data

  5. Open Linked Data Cloud Connected and linked data with context Created by a community A valuable resource that will only Grow! Something we can learn from! Significant scientific content Significant linking hubs appearing

  6. Its still growing

  7. Linked Data Example DBPedia RDF-ized version of Wikipedia Now the world of knowledge is queryable! Question:- All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants • SELECT DISTINCT ?player { • ?s foaf:page ?player. • ?s rdf:type <http://dbpedia.org/ontology/SoccerPlayer> . • ?s dbpedia2:position ?position . • ?s <http://dbpedia.org/property/clubs> ?club . • ?club <http://dbpedia.org/ontology/capacity> ?cap . • ?s <http://dbpedia.org/ontology/birthPlace> ?place . • ?place ?population ?pop . • OPTIONAL {?s <http://dbpedia.org/ontology/number> ?tricot.} • Filter (?population in (<http://dbpedia.org/property/populationEstimate>, <http://dbpedia.org/property/populationCensus>, <http://dbpedia.org/property/statPop> )) • Filter (xsd:int(?pop) >10000000 ) . • Filter (xsd:int(?cap) <40000 ) . • Filter (?position = "Goalkeeper"@en || ?position = <http://dbpedia.org/resource/Goalkeeper_%28association_football%29> || ?position = <http://dbpedia.org/resource/Goalkeeper_%28football%29>) • } Limit 1000

  8. Linked Data Initiatives • Bio2RDF (http://quebec.bio2rdf.org/) • The Bio2RDF project is a tool to convert bioinformatics data and knowledge bases to RDF format. It is a kind of generalized rdfizer for bioinformatics applications, and it is a place for the semantic web life science community to develop and grow. • Typical Queries • What is the pathway network of mouse metabolism ? • What are the genes involved in a KEGG pathway ? • SELECT distinct ?label1, ?sameAs5, ?xobject4 WHERE { ?Pathway1 <http://www.w3.org/2000/01/rdf-schema#label> ?label1 . ?Pathway1 <http://bio2rdf.org/kegg#xrelation> ?xrelation2 . ?xrelation2 <http://bio2rdf.org/kegg#xentry1> ?xentry3 . ?xentry3 <http://bio2rdf.org/kegg#xobject> ?xobject4 . ?xobject4 <http://www.w3.org/2002/07/owl#sameAs> ?sameAs5 . FILTER (?Pathway1 = <http://bio2rdf.org/path:mmu00010>) } • http://sourceforge.net/apps/mediawiki/bio2rdf/index.php?title=Demo_queries

  9. Linked Data Initiatives • Linking Open Drug Data (LODD) (HCLSIG, W3C)(http://esw.w3.org/HCLSIG/LODD) • Focus on linking the various sources of drug data together to answer interesting scientific and business questions • Typical Queries • What other drugs are available for this disease? • What side effects are there for this drug, especially those not on the label? • Is my patient a good candidate for a particular drug? • http://esw.w3.org/HCLSIG/LODD/Business • http://www4.wiwiss.fu-berlin.de/lodd/topquestions/index.php

  10. Linked Data Initiatives • ChemBio2RDF (http://chem2bio2rdf.wikispaces.com/) • Addresses the challenges of systems chemical biology • Typical Queries • find all the pathways that contains multiple targets, at least two of which are targeted by compounds that are associated with a given side effect (i.e. hepatonecrosis)

  11. Linked Data: Thoughts & Problems • The array of linked data available is hugely valuable to every pharmaceutical company with interest in translational medicine or simply looking to enhance their own datasets. • To be of practical use these data sources must be • Efficiently hosted • Regularly updated • High quality • Data and the mappings across data sources. • In practice • Data sources are not provided by the data providers themselves, • Data sources become stale. • Computer scientists mapping data sources • Quality isn’t always as high as it could be. • In addition many of the interface tools for these data sources are not robust. • Hosting servers are inadequate • Access points (Endpoints) often down. • URI’s often change

  12. Discussion Topics • Do we believe that LOD is valuable to pharma? • Do we believe that pharma should tailor LOD to our needs? • Quality, quantity and links within Data • Infrastructure • If so by what mechanism? • Organization to support, coordinate and direct existing LOD groups. • Feasibility? • Pharma consortium creates and hosts its own high quality interlinked data • Third Party organization • Academic, non-profit, or commercial? • Alternatives • We each repeat the same exercises to create similar linked data sets • Costs, Time, Resources

More Related