1 / 22

Provenance of scientific information as experienced in DRIVER

Provenance of scientific information as experienced in DRIVER. 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008. Wolfram Horstmann Bielefeld University / DRIVER. Notions of Provenance. Where do data objects* originate from? Scientific Work -- examples

dalit
Download Presentation

Provenance of scientific information as experienced in DRIVER

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Provenance of scientific informationas experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24th November 2008 Wolfram Horstmann Bielefeld University / DRIVER

  2. Notions of Provenance • Where do data objects* originate from? • Scientific Work -- examples • Instrumentation techniques • Manufacturers of hard- and software • Methodologies • Processes, e.g. gene sequencing • Technical/Local -- examples • (web)-identifiers • Database, repository name * Primary data, documents, metadata …

  3. Why Provenance? • Quoting / Citing / Referencing as global scientific principle • „Reproducible research“ • Giving credits to authors / creators in distributed environments • Original location / context has to be known • Experienced in Grid-Environments [1]

  4. Provenance & Interoperability • Re-Use / Sharing: “Addressing/Accessing” • Common view, common use • Unidirectional: No change of data objects! • Federation: “Discovering in Context” • Remote representation of distributed DOs • Aggregation: “Contextualizing” • Add unchanged object in a context • Processing/Annotation: “Changing” • Uni- vs. Bidirectional: Change of DOs and remote representation vs. back-storage (e.g. CVS)

  5. Scenarios in DRIVER

  6. Digital Scientific Data

  7. Digital Object Collections ⊃ ⊃ ⊃ ⊃

  8. Digital Object Repositories + + + + =

  9. Digital Information Space

  10. Conventional Web Data

  11. „Simple“ Applications

  12. Metadata Infrastructure

  13. Basic Provenance Settings • Indicate Production Situation • Metadata • Author, Instrumentation etc. • Remote Representation • Indicate place of origin in remote systems • Metadata as digital objects / first order citizens • Allow lineage respresentation • Credits in remote environments / versioning

  14. Orders of Provenance • 1st order: Metadata • Provenance attached to data • Minimal „knowledge“ required in application • Allow remote handling of data objects • Require metadata infrastructure • Metadata introduce 2 objects: requires linkage • 2nd order: context / compounds • Express multiple relations between objects • May introduce semantic model

  15. Provenance in DRIVER #1 • Simple Objects: OAI-PMH [2] • 1st order provenance • Metadata: minimum OAI-DC • 2nd order provenance • DRIVER explicit identifiers for repositories • OAI-PMH: inline representation („about“)

  16. Semantic/Compound Data

  17. „Semantic“ Applications

  18. Provenance in DRIVER #2 • „Enhanced Publications“ • Research project in DRIVER-II • Representation of data /document packages • Use of OAI-ORE

  19. Provenance in OAI-ORE • OAI-ORE: Object Re-Use and Exchange[4] • Uses Resource Maps < Named Graphs • Uses „lineage“ to represent expl. Provenance • Future: explicit provenance model [7] ?

  20. Summary • Provenance essential for … • Indicating origin in distributed data spaces • Accessing / Addressing • Federation / Aggregation • Processing / Annotation • Document and data citation / trace-back • 1st order: describing data > metadata • 2nd order: describing context > semantic data

  21. Lessons learnt in DRIVER • Use web-enabled Identification (URI/UDDI etc.) • „Dark“ databases don‘t interoperate • 1st order provenance at place of origin • Requires metadata to describe origin • Enables a metadata infrastructure • Introduces linkage problem • 2nd order provenance in contexts • Requires data provider identification in federators / aggregators in order to link back • May require semantic model for context • Would benefit from a semantic infrastructure

  22. Resources [1] On provenance in the eScience / grid-environment • http://www.sigmod.org/sigmod/record/issues/0509/p31-special-sw-section-5.pdf • In GLITE • http://www.cesnet.cz/doc/techzpravy/2007/glite-job-provenance/ • http://twiki.ipaw.info/bin/view/Challenge [2] On provenance in OAI-PMH • http://www.openarchives.org/OAI/2.0/guidelines-provenance.htm [3] On provenance OAI-ORE (referred to as ore:lineage) • http://www.openarchives.org/ore/meetings/Soton/ore_beyond_basics.pdf (general) • http://www.openarchives.org/ore/1.0/vocabulary (definition) [4] Named Graphs, Provenance and Trust (Caroll et al. ) • http://www4.wiwiss.fu-berlin.de/bizer/SWTSGuide/carroll-ISWC2004.pdf [5] W3C: On provenance in RDF • http://www.w3.org/2001/12/attributions/ [6] Open Provenance Model • http://eprints.ecs.soton.ac.uk/14979/1/opm.pdf [7] DRIVER: Digital Repository Infrastructure for European Research • http://www.driver-community.eu

More Related