140 likes | 309 Views
Towards a Social Notion of Provenance on the Web. Andreas Harth, Axel Polleres, Stefan Decker Principles of Provenance Workshop, Edinburgh November 2007. Outline. Provenance on the Web RDF Data Integration Query and Retrieval Social Notion of Provenance. Provenance on the RDF Web.
E N D
Towards a Social Notion of Provenance on the Web Andreas Harth, Axel Polleres, Stefan Decker Principles of Provenance Workshop, Edinburgh November 2007
Outline • Provenance on the Web • RDF Data Integration • Query and Retrieval • Social Notion of Provenance
Provenance on the RDF Web http://dbpedia.org/docs/downloads/articles_externallinks_en.nt dbr:Talbot_ Rice_Gallery dbp:reference http://www.ed.ac.uk/#org http://lsdis.cs.uga.edu/proje.../sem_web_subset.rdf dblp:Kingston_John opus:affiliation http://www.ed.ac.uk/#org opus: <http://lsdis.cs.uga.edu/projects/semdis/opus#> dbr: <http://dbpedia.org/resource/> dbp: <http://dbpedia.org/property/> dblp: <http://www.informatik.uni-trier.de/~ley/db/indices/>
Integrated Graph http://dbpedia.org/docs/downloads/articles_externallinks_en.nt dbr:Talbot_ Rice_Gallery dbp:reference dblp:Kingston_John opus:affiliation http://www.ed.ac.uk/ #org http://lsdis.cs.uga.edu/proje.../sem_web_subset.rdf
Integration Pipeline http://www.ed.ac.uk/#org = http://dbpedia.org/Resource/University_of_Edinburgh = http://lsdis.cs.uga.edu/...Edinburgh opus:University rdfs:subClassOf foaf:Organisation HTTP lookup URI normalisation Entity consolidation Class and property hierarchy reasoning Date: Sun, 18 Nov 2007 21:29:04 GMT Server: Apache/1.3.37 Cache-Control: max-age=86400 Expires: Mon, 19 Nov 2007 21:29:04 GMT Content-Type: text/html; charset=iso-8859-1 http://www.ed.ac.uk -> http://www.ed.ac.uk/
Basic Retrieval “University of Edinburgh” http://www.ed.ac.uk/#org = http://dbpedia.org/Resource/University_of_Edinburgh = http://lsdis.cs.uga.edu/...Edinburgh http://www.ed.ac.uk/#org • rdf:type • opus:University • dbpedia:University • foaf:Organisation • dc:title • University of Edinburgh • foaf:based_near • dbpedia:Edinburgh http://lsdis.cs.uga.edu/proje.../sem_web_subset.rdf http://dbpedia.org/docs/downloads/articles_externallinks_en.nt Subclass reasoning using data from source A and B http://lsdis.cs.uga.edu/proje.../sem_web_subset.rdf http://dbpedia.org/docs/downloads/articles_externallinks_en.nt
URIs and HTTP • URI: resource identifier used to point at things • HTTP: retrieve file associated with URI, including low-level file information (content type, last change date, server)
HTTP Metadata per Data Source Sun, 18 Nov 2007 21:29:04 GMT last-modified Sun, 18 Nov 2007 21:30:12 GMT last_access http://www.ed.ac.uk/ index.rdf
Query with Provenance • Return all Universities in Edinburgh (and report sources) SELECT ?s ?c1 ?c2 WHERE { GRAPH ?c1 { ?s rdf:type opus:University . } GRAPH ?c2 { ?s foaf:based_near opus:Edinburgh . } }
Query with Data from Selected Sources • Return Edinburgh universities from DBpedia SELECT ?s FROM <http://dbpedia.org/docs/downloads/articles_en.nt> WHERE { ?s rdf:type dbpedia:University . ?s foaf:based_near dbpedia:Edinburgh . }
Query with Data from Selected People • Return all information about Edinburgh university from affiliated people • Requires to query both logical level (data) and physical level (data sources)
Describing a Data Source Sun, 18 Nov 2007 21:29:04 GMT HTTP last-modified Sun, 18 Nov 2007 21:30:12 GMT last_access http://www.ed.ac.uk/ index.rdf domainname dns:ed.ac.uk foaf:maker registered_to http://www.ed.ac.uk/ #Joe_W_Master The University of Edinburgh, Network Services Division, The King’s Buildings opus:affiliation http://www.ed.ac.uk/ #org DNS http://www.ed.ac.uk/~joe/card.rdf
Return all information about Edinburgh university from affiliated people CONSTRUCT { <http://www.ed.ac.uk/#org> ?p ?o . } WHERE { GRAPH ?c { <http://www.ed.ac.uk/#org> ?p ?o . } ?person foaf:maker ?c . ?person opus:affiliation <http://www.ed.ac.uk/#org> . }
Conclusion • Provenance tracking on the Internet using information from infrastructure (DNS), protocol (HTTP), data (RDF) • Possible to select sources for processing, querying, and displaying • e.g. to exclude link spam • e.g. to exclude malicious sources • e.g. to include information from your social network • Desirable to lift the provenance model from sources to people, organisations and programs