290 likes | 358 Views
Bluffers Guide to The Semantic Web. Data wants to be free. Frank van Harmelen CS Department Vrije Universiteit Amsterdam. Semantics as your saviour?. Outline. The general idea: a Web of Data What must be done to realise this How far away is this Nex steps, do’s, don’ts.
E N D
Bluffers Guide toThe Semantic Web Data wants to be free Frank van Harmelen CS Department Vrije Universiteit Amsterdam
Outline • The general idea: a Web of Data • What must be done to realise this • How far away is this • Nex steps, do’s, don’ts
The Scientist’s Problem Everybody’s Too much unintegrated data: • from a variety of incompatible sources • no standard naming convention • each with a custom browsing and querying mechanism (no common interface) • and poor interaction with other data sources
What are the Data Sources? • Flat Files • URLs • Proprietary Databases • Public Databases • Spreadsheets • Emails • … Data wants to be free Maps
In which disciplines? a new database each month One dataset per site • Archeology • Chemistry • Genomics, proteomics, ... (bio/life-sciences) • Communication science • Social history • Linguistics • Bio-diversity • Environmental sciences (climate studies) • .... • libraries (KB), archives (sound&vision) Geo? historical data laymen data international data (for their first time)
Outline • The general idea: a Web of Data • What must be done to realise this • How far away is this • Nex steps, do’s, don’ts
The Future Web of Data The Current Web of text and pictures and another web page about Frank This page is about the Vrije Uniersitei a web page in English about Frank And this page is about LarKC And this page is about Stefano Data wants to be free ? ? ? linked web-pages, written by people, written for people, used only by people... ? ? linked data, usable by computers! useful for people! Many of these pages already come from data, that is usable by computers! But we can’t link the data....
Which Semantic Web? • Version 1:“Enrichment of the current Web” • recipe:Annotate and classify web-content • enable better search & browse,..
Which Semantic Web? • Version 2:"Semantic Web as Web of Data" (TBL) • recipe:expose databases on the web, use RDF, integrate • meta-data from: • expressing DB schema semantics in machine interpretable ways • enable integration and unexpected re-use
Outline • The general idea: a Web of Data • What must be done to realise this • How far away is this • Nex steps, do’s, don’ts
alleviates <treatment> <name> <symptoms> <drug> IS-A <disease> <drugadministration> machine accessible meaning(What it’s like to be a machine) META-DATA
name symptoms disease drug administration What is meta-data? • it's just data • it's data describing other data • its' meant for machine consumption
Required are: • a standard syntax • so meta-data can be recognised as such • one or more shared vocabularies • so data producers and data consumers all speak the same language • lots of resources with meta-data attached • mechanisms for attribution and trust
1. A standard syntax Semantic Web data model: RDF things & relations between things
RDF Triples in Geo <rdf:RDF> <geo:Point> <geo:lat>55.701</geo:lat> <geo:long>12.552</geo:long> </geo:Point> </rdf:RDF> Remember: RDF = simple model for data 55.701 geo:lat geo:point:_ 12.552 geo:long
RDF Schema: vocabulary for data types • Classes + subclass hierarchy • rivers are waterways • Properties + subproperty hierarchy • father-of implies parent-of • Domain of properties • X capital-of YX has-type city • Range of properties • X capital-of YY has-type country Simple standardised inferences
OWL OWL: richer vocabulary for data types • Things RDF Schema cannot express: • Description Logic SHOIN(D) • equality, disjunction, negation, • min/max number restrictions • inverse, symmetric, transitive properties • and much more… Complex standardised inferences Example: Every country has precisely one capital: Inference TheHague ≠ A’dam & A’dam = capital TheHague ≠ capital Integrity checks after data-merging
different owners & locations Web of Data: anybody can say anything about anything • All identifiers are URL's (= on the Web) • Allows total decoupling of • data • vocabulary • meta-data Data wants to be free [<x> IsOfType <T>] x T <prince>
2. Shared vocabularies BioMed • Mesh • Medical Subject Headings, National Library of Medicine • 22.000 descriptions • EMTREE • Commercial Elsevier, Drugs and diseases • 45.000 terms, 190.000 synonyms • UMLS • Integrates 100 different vocabularies • SNOMED • 200.000 concepts, College of American Pathologists • Gene Ontology • 15.000 terms in molecular biology • NCBI Cancer Ontology: • 17,000 classes (about 1M definitions) Geo?
Outline • The general idea: a Web of Data • What must be done to realise this • How far away is this • Nex steps, do’s, don’ts
How far away is this ? • Stable data formats & standardised inferences • Lots of shared vocabularies (+ ways to convert them) • Lots of data sources(+ ways to convert them) • Lots of tools • convert, construct, edit (data, vocabularies) • store, search, query, reason • interlink • visualise • ...
How far away is this ? Not very far away! every book sold by Amazon rapidly growing Linked Open Data cloud. already many billions of facts & rules any CD ever recorded (almost) life-science databases basic facts on every country on the planet hierarchical dictionaries (UK, FR, NL) common sense rules & facts (100.000’s) scientific bibliographies names of artists & art works (10.000’s) Geographic names (millions) Encyclopedia It gets bigger every month
Example use-case: bbc.co.uk/music/artists • Content is BBC + LOD • Use an ontology as basis for the site • Serve data back out as RDF • “The Web is becoming our content management platform”
Outline • The general idea: a Web of Data • What must be done to realise this • How far away is this • Nex steps, do’s, don’ts
learn / get access to some basic technology Next steps Can you get famous by sharing data? • hunt for shared vocabularies • try to avoid building them • wrap legacy data sources • your own • from others • link wrapped sources • publish linked data on the web • make noise • reconstruct some old results • produce new results • get famous A little semantics goes a long way in-use systems in communication science, KB, Beeld & Geluid, Europeana papers in oncology, in communication science, dedicated conferences in chemistry, earth-sciences, life-sciences, humanities funding opportunities in humanities, social sciences, life sciences
Frank.van.Harmelen@cs.vu.nl http://www.cs.vu.nl/~frankh/popularising.html Questions & discussion