220 likes | 342 Views
Provenance of scientific information as experienced in DRIVER. 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008. Wolfram Horstmann Bielefeld University / DRIVER. Notions of Provenance. Where do data objects* originate from? Scientific Work -- examples
E N D
Provenance of scientific informationas experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24th November 2008 Wolfram Horstmann Bielefeld University / DRIVER
Notions of Provenance • Where do data objects* originate from? • Scientific Work -- examples • Instrumentation techniques • Manufacturers of hard- and software • Methodologies • Processes, e.g. gene sequencing • Technical/Local -- examples • (web)-identifiers • Database, repository name * Primary data, documents, metadata …
Why Provenance? • Quoting / Citing / Referencing as global scientific principle • „Reproducible research“ • Giving credits to authors / creators in distributed environments • Original location / context has to be known • Experienced in Grid-Environments [1]
Provenance & Interoperability • Re-Use / Sharing: “Addressing/Accessing” • Common view, common use • Unidirectional: No change of data objects! • Federation: “Discovering in Context” • Remote representation of distributed DOs • Aggregation: “Contextualizing” • Add unchanged object in a context • Processing/Annotation: “Changing” • Uni- vs. Bidirectional: Change of DOs and remote representation vs. back-storage (e.g. CVS)
Digital Object Collections ⊃ ⊃ ⊃ ⊃
Digital Object Repositories + + + + =
Basic Provenance Settings • Indicate Production Situation • Metadata • Author, Instrumentation etc. • Remote Representation • Indicate place of origin in remote systems • Metadata as digital objects / first order citizens • Allow lineage respresentation • Credits in remote environments / versioning
Orders of Provenance • 1st order: Metadata • Provenance attached to data • Minimal „knowledge“ required in application • Allow remote handling of data objects • Require metadata infrastructure • Metadata introduce 2 objects: requires linkage • 2nd order: context / compounds • Express multiple relations between objects • May introduce semantic model
Provenance in DRIVER #1 • Simple Objects: OAI-PMH [2] • 1st order provenance • Metadata: minimum OAI-DC • 2nd order provenance • DRIVER explicit identifiers for repositories • OAI-PMH: inline representation („about“)
Provenance in DRIVER #2 • „Enhanced Publications“ • Research project in DRIVER-II • Representation of data /document packages • Use of OAI-ORE
Provenance in OAI-ORE • OAI-ORE: Object Re-Use and Exchange[4] • Uses Resource Maps < Named Graphs • Uses „lineage“ to represent expl. Provenance • Future: explicit provenance model [7] ?
Summary • Provenance essential for … • Indicating origin in distributed data spaces • Accessing / Addressing • Federation / Aggregation • Processing / Annotation • Document and data citation / trace-back • 1st order: describing data > metadata • 2nd order: describing context > semantic data
Lessons learnt in DRIVER • Use web-enabled Identification (URI/UDDI etc.) • „Dark“ databases don‘t interoperate • 1st order provenance at place of origin • Requires metadata to describe origin • Enables a metadata infrastructure • Introduces linkage problem • 2nd order provenance in contexts • Requires data provider identification in federators / aggregators in order to link back • May require semantic model for context • Would benefit from a semantic infrastructure
Resources [1] On provenance in the eScience / grid-environment • http://www.sigmod.org/sigmod/record/issues/0509/p31-special-sw-section-5.pdf • In GLITE • http://www.cesnet.cz/doc/techzpravy/2007/glite-job-provenance/ • http://twiki.ipaw.info/bin/view/Challenge [2] On provenance in OAI-PMH • http://www.openarchives.org/OAI/2.0/guidelines-provenance.htm [3] On provenance OAI-ORE (referred to as ore:lineage) • http://www.openarchives.org/ore/meetings/Soton/ore_beyond_basics.pdf (general) • http://www.openarchives.org/ore/1.0/vocabulary (definition) [4] Named Graphs, Provenance and Trust (Caroll et al. ) • http://www4.wiwiss.fu-berlin.de/bizer/SWTSGuide/carroll-ISWC2004.pdf [5] W3C: On provenance in RDF • http://www.w3.org/2001/12/attributions/ [6] Open Provenance Model • http://eprints.ecs.soton.ac.uk/14979/1/opm.pdf [7] DRIVER: Digital Repository Infrastructure for European Research • http://www.driver-community.eu