120 likes | 196 Views
Linked Data . PRISM Oct 2010. Data. Data is captured within systems e.g. Web pages Databases Documents But it’s difficult to make use of data Access is difficult So many ways, types Meaning is unknown Unless you chat to the dba, author
E N D
Linked Data PRISM Oct 2010
Data • Data is captured within systems e.g. • Web pages • Databases • Documents • But it’s difficult to make use of data • Access is difficult • So many ways, types • Meaning is unknown • Unless you chat to the dba, author • Machines have difficulty understanding what the data represents • Integration is difficult • Warehouses can’t hold everything and relate everything to everything else.
Data Integration Problem Involved in every project Creates overhead Many data dources Adhoc solutions Maintenance issues Individually project focused No direction Pulling ourselves apart
What is Linked Data? • Linked data is webified data accessible via http: • It’s just the next evolution of the web • The Semantic Web • RDF is the fundamental data model • It uses URIs and so allows linking of things and concepts • It allows separate systems designed independently to be later joined at the edges • It allows interoperability to be added where cost-effective • It allows any data to be expressed in a mixture of vocabularies. • Lowers the barrier for data access • Dramatically increases data usability • Allows machines to extract and interpret the data
Open Linked Data Cloud Connected and linked data with context Created by a community A valuable resource that will only Grow! Something we can learn from! Significant scientific content Significant linking hubs appearing
Linked Data Example DBPedia RDF-ized version of Wikipedia Now the world of knowledge is queryable! Question:- All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants • SELECT DISTINCT ?player { • ?s foaf:page ?player. • ?s rdf:type <http://dbpedia.org/ontology/SoccerPlayer> . • ?s dbpedia2:position ?position . • ?s <http://dbpedia.org/property/clubs> ?club . • ?club <http://dbpedia.org/ontology/capacity> ?cap . • ?s <http://dbpedia.org/ontology/birthPlace> ?place . • ?place ?population ?pop . • OPTIONAL {?s <http://dbpedia.org/ontology/number> ?tricot.} • Filter (?population in (<http://dbpedia.org/property/populationEstimate>, <http://dbpedia.org/property/populationCensus>, <http://dbpedia.org/property/statPop> )) • Filter (xsd:int(?pop) >10000000 ) . • Filter (xsd:int(?cap) <40000 ) . • Filter (?position = "Goalkeeper"@en || ?position = <http://dbpedia.org/resource/Goalkeeper_%28association_football%29> || ?position = <http://dbpedia.org/resource/Goalkeeper_%28football%29>) • } Limit 1000
Linked Data Initiatives • Bio2RDF (http://quebec.bio2rdf.org/) • The Bio2RDF project is a tool to convert bioinformatics data and knowledge bases to RDF format. It is a kind of generalized rdfizer for bioinformatics applications, and it is a place for the semantic web life science community to develop and grow. • Typical Queries • What is the pathway network of mouse metabolism ? • What are the genes involved in a KEGG pathway ? • SELECT distinct ?label1, ?sameAs5, ?xobject4 WHERE { ?Pathway1 <http://www.w3.org/2000/01/rdf-schema#label> ?label1 . ?Pathway1 <http://bio2rdf.org/kegg#xrelation> ?xrelation2 . ?xrelation2 <http://bio2rdf.org/kegg#xentry1> ?xentry3 . ?xentry3 <http://bio2rdf.org/kegg#xobject> ?xobject4 . ?xobject4 <http://www.w3.org/2002/07/owl#sameAs> ?sameAs5 . FILTER (?Pathway1 = <http://bio2rdf.org/path:mmu00010>) } • http://sourceforge.net/apps/mediawiki/bio2rdf/index.php?title=Demo_queries
Linked Data Initiatives • Linking Open Drug Data (LODD) (HCLSIG, W3C)(http://esw.w3.org/HCLSIG/LODD) • Focus on linking the various sources of drug data together to answer interesting scientific and business questions • Typical Queries • What other drugs are available for this disease? • What side effects are there for this drug, especially those not on the label? • Is my patient a good candidate for a particular drug? • http://esw.w3.org/HCLSIG/LODD/Business • http://www4.wiwiss.fu-berlin.de/lodd/topquestions/index.php
Linked Data Initiatives • ChemBio2RDF (http://chem2bio2rdf.wikispaces.com/) • Addresses the challenges of systems chemical biology • Typical Queries • find all the pathways that contains multiple targets, at least two of which are targeted by compounds that are associated with a given side effect (i.e. hepatonecrosis)
Linked Data: Thoughts & Problems • The array of linked data available is hugely valuable to every pharmaceutical company with interest in translational medicine or simply looking to enhance their own datasets. • To be of practical use these data sources must be • Efficiently hosted • Regularly updated • High quality • Data and the mappings across data sources. • In practice • Data sources are not provided by the data providers themselves, • Data sources become stale. • Computer scientists mapping data sources • Quality isn’t always as high as it could be. • In addition many of the interface tools for these data sources are not robust. • Hosting servers are inadequate • Access points (Endpoints) often down. • URI’s often change
Discussion Topics • Do we believe that LOD is valuable to pharma? • Do we believe that pharma should tailor LOD to our needs? • Quality, quantity and links within Data • Infrastructure • If so by what mechanism? • Organization to support, coordinate and direct existing LOD groups. • Feasibility? • Pharma consortium creates and hosts its own high quality interlinked data • Third Party organization • Academic, non-profit, or commercial? • Alternatives • We each repeat the same exercises to create similar linked data sets • Costs, Time, Resources