820 likes | 952 Views
February 16, 201 2. Linked Data Tutorial. Tomáš Knap, Jindřich Mynarz , Martin Nečaský, Jakub Stárka . (Partially based on slides of Chris Bizer [9]). Motivation. Motivational Scenario. Basic data. Public contracts. Employees. Departments. Budget. Expenses.
E N D
February16, 2012 Linked Data Tutorial Tomáš Knap, Jindřich Mynarz, Martin Nečaský, Jakub Stárka (Partially based on slides of Chris Bizer [9])
Motivational Scenario Basic data Public contracts Employees Departments Budget Expenses WWW page of the institution Business Register ÚFIS Buyer‘s Profile ISVZUS gov.cz • Data Consumer: Show me suppliers of the public contracts for the Ministry of Finance (MF) in the region Liberec. Show me the data on the Google maps in iPhone. For every public contract, I am also looking for the aggregation of all the payments made by MF, link to their budget and responsible person. • Where can I get the data about public contracts, responsible persons, expenses, and budget of MF? • How should I aggregate and link the data? • How can I observe the data on the map?
Current Common Practise Basic data Public contracts Employees Departments Budget Expenses WWW page of the institution Business Register ÚFIS Buyer‘s Profile ISVZUS gov.cz 3 - Expenses ? 2 – MF public contracts + employees 1 – MF public contracts ? Consumer did not discovered ? Information Integration very time consuming, boring, and ineffective!
Linked Data • Set of best practices for publishing structured data on the Web in accordance with the general architecture of the Web • using Semantic Web technologies and standards • Semantic Web is the goal, Linked Data provides the means to reach the goal
Linked Data Principles • Use URIs as names for things • Use HTTP URIs so that people can look up those names. • When someone looks up a URI, provide useful RDFinformation • Include RDF statements that link to other URIs so that theycan discover relatedthings. [Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006]
Architecture of the Classic Web • Single global information space • Small set of simple standards: • HTTP URI • globally unique ID • retrieval mechanism • HTML as document format • Hyperlinks to connect everything • Applications work on top of the complete information space
Web 2.0 APIs and Mashups • No single global dataspace • Shortcomings: • API have proprietary interfaces • No hyperlinks between data items within different APIs • Mashups are based on a fixed set of data sources Web APIs slice the Web into Walled Gardens!
Linked Data • Extend the Web with a single global dataspace • By using RDF to publish structured data on the Web • By setting links between data items within differentdata sources. • Physically distributed, behaves like single dataspace
RDF Data Model • Flexible graph-based data model [2] • HTTP URIs take the role of global primary keys. • pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygri • dbpedia:Berlin = http://dbpedia.org/resource/Berlin
Resolving URIs over the Web • The HTTP protocol brings together identification andretrieval
Pubby – Linked Data Browser http://dbpedia.org/page/Český_Krumlov
Propertiesofthe Web ofLinked Data • Global, distributed dataspace build on a simple set of standards • RDF, URIs, HTTP • Entities are connected by links • creating a global data graph that spans data sources • enables the discovery of new data sources • Data-coexistence • Everyone can publish data to the Web of Linked Data • Everyone can express their personal view on things
W3C Linking Open Data Project • Grassroots community effort to • Publish existing open license datasets as Linked Data on the Web • Interlink things between different data sources
Linked Data Cloud2011 http://richard.cyganiak.de/2007/10/lod/lod-datasets_2011-09-19_colored.pdf http://thedatahub.org/
More Statistics http://stats.lod2.eu/stats
Uptake in Governmental Domain • The EU is publishing LinkedData • EuroStat • http://estatwrap.ontologycentral.com/ • National efforts • The Government is releasing public data • http://data.gov.uk/ • Lots of initiatives in Great Britain • Budget in Germany • http://bund.offenerhaushalt.de/ • Open Data in Catalonia • http://opendata.gencat.cat/en/dades-obertes.html
Data.gov.uk http://data.gov.uk/organogram/cabinet-office
Linked Data Applications Linked Data Browsers ? ? ? ?
Search Engines - Sig.ma http://sig.ma
Mashups – Public Contracts On the Map http://gd.projekty.ms.mff.cuni.cz:2021/new/map.html
Mashups – Crime, Transport, Education http://apps.seme4.com/see-uk/
Other Applications • Browsers: • Disco Hyperdata Browser • http://www4.wiwiss.fu-berlin.de/rdf_browser/ • OpenLink RDF Browser • http://ode.openlinksw.com/ • Search Engines • Falcons • http://ws.nju.edu.cn/falcons/ • Watson • http://watson.kmi.open.ac.uk/WatsonWUI/ • Mashups
Linked Data Applications - Summary Linked Data Mashups Search Engines Linked Data Browsers
Publishing Tasks – Bizer 38 • 1. Make data available as RDF via HTTP • Requires ways to serialize RDF data model • 2. Set RDF links pointing at other data sources • 3. Make your data self-descriptive
RDF/XML • W3C Recommendation, 2004 [2]
Turtle Syntax @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix dataModel: <http://www.w3.org/2000/10/swap/pim/contact#> . @prefix myContact: <http://www.w3.org/People/EM/contact#> . myContact:me rdf:type dataModel:Person ; dataModel:fullName "Eric Miller". dataModel:mailbox <mailto:em@w3.org>. dataModel:personalTitle "Dr.". • W3C Team Submission, 2011, [4]
RDFa • A way to directly add RDF to XHTML pages • Provides new attributes to handle additional markup • W3C Recommendation, 2008 [5] • HTML is not extendable • most RDFa parsers will recognize RDFa attributes in any version of HTML
RDFa • Provides new attributes to handle additional markup, reuses existing • About, resource, … • Href, src, … • Used with any supported element, prefered: • Span, div (in the body) • a (linking element) • Meta, link (in the header)
RDFa Example • XHTML page http://example.com/alice/posts/42 • Original XHTML code All content on this site is licensed under <a href="http://cc.org/licenses/by/3.0/"> a Creative Commons License </a>. • XHTML + RDFa All content on this site is licensed under <a rel=“cc:license" href="http://cc.org/licenses/by/3.0/"> a Creative Commons License </a>. • RDF triples destilled from XHTML+RDFa <http://example.com/alice/posts/42> cc:license<http://cc.org/licenses/by/3.0/>.
RDF store + Linked Data Interface • Virtuoso + pubby
D2R server • A way how to publish data in relational databases as Linked Data • Requests from the Web are rewritten into SQL queries via the mapping. • on-the-fly translation • eliminates the need for replicating the data into a dedicated RDF triple store.
Publishing Tasks 1. Make data available as RDF via HTTP 2. Set RDF links pointing at other data sources 3. Make your data self-descriptive
2. Set RDF links <http://dbpedia.org/resource/Berlin> owl:sameAs <http://sws.geonames.org/2950159> . • There are tools to help you generate links • Silk [6]
Publishing Tasks 1. Make data available as RDF via HTTP 2. Set RDF links pointing at other data sources 3. Make your data self-descriptive
3. Make your data self-descriptive • Increase the usefulness of your data and ease data integration • Aspects of self-descriptiveness • 1. Reuse terms from common vocabularies • 2. Enable clients to retrieve the schema • 3. Publish schema mappings for proprietary terms • 4. Metadata • Provide provenance metadata • Provide licensing metadata • Provide data-set-level metadata using voiD
About Vocabularies • We have to be able to define the meaning of the subject, properties • Vocabularies, e.g. Public contracts ontology
Public Contracts Ontology http://purl.org/procurement/public-contracts#
RDFS • RDFS = RDF Schema • W3C recommendation • http://www.w3.org/TR/rdf-schema/ • Vocabulary for RDF • Definition of classes • is:Studentrdf:typerdfs:Class • Definition of properties • is:namerdf:typerdfs:Property • Domains and ranges of properties • is:namerdfs:domainis:Student • is:namerdfs:rangexsd:string
OWL • OWL = Web Ontology Language • W3C recommendation • http://www.w3.org/TR/owl2-overview/ • Ontologies • More complex constructs • Class or property equivalences • Cardinality restrictions • …
3. Make your data self-descriptive • Increase the usefulness of your data and ease data integration • Aspects of self-descriptiveness • 1. Reuse terms from common vocabularies • 2. Enable clients to retrieve the schema • 3. Publish schema mappings for proprietary terms • 4. Metadata • Provide provenance metadata • Provide licensing metadata • Provide data-set-level metadata using voiD
3.1 Reuse Terms from Common vocabularies • Common Vocabularies • Friend-of-a-Friendfor describing people and their social network • SIOCfor describing forums and blogs • SKOSfor representing topic taxonomies • Organization Ontology for describing the structure of organizations • GoodRelations provides terms for describing products and business entities • Music Ontology for describing artists, albums, and performances • Review Vocabulary provides terms for representing reviews • Common sources of identifiers (URIs) for real world objects • LinkedGeoData and Geonames locations • GeneID and UniProt life science identifiers • DBpedia wide range of things
3.2 Enable Clients to retrieve the Schema • Clients can resolve the URIs that identify vocabularyterms in order to get their RDFS or OWL definitions. • If we discover in data URI: <http://opendata.cz/data/p6/contract/ocz_art_5161> http://purl.org/procurement/public-contracts#awardDate "2011-11-11"^^<http://www.w3.org/2001/XMLSchema#date> ; • We resolve the URI and get the definition: RDFS or OWL definition
3.3 Publish Schema Mappings pc:Tender a owl:Class; rdfs:subClassOfgr:Offering. pc:AwardCriterion a owl:Class; owl:equivalentClassloted:AwardCriteria. • Simple Mappings: • rdfs:subClassOf, rdfs:subPropertyOf • owl:equivalentClass, owl:equivalentProperty • Complex mappings – R2R [7]