140 likes | 221 Views
6th e-Infrastructure Concertation Lyon 24 Nov 2008. “provenance”. DATA TRACK Chair : Krystyna Marek Rapporteur: Wolfram Horstmann. Motivation. Last two meetings were on standards It was proposed to have a more focussed discussion
E N D
6th e-Infrastructure Concertation Lyon 24 Nov 2008 “provenance” DATA TRACK Chair : Krystyna Marek Rapporteur: Wolfram Horstmann
Motivation • Last two meetings were on standards • It was proposed to have a more focussed discussion • Focus on practice and interoperability rather than standards • Select an arbitrary but important topic
Notions of Provenance • Where do data objects* originate from? • Scientific Work -- examples • Instrumentation techniques • Manufacturers of hard- and software • Methodologies • Processes, e.g. gene sequencing • Technical/Local -- examples • (web)-identifiers • Database, repository name * Primary data, documents, metadata …
Why Provenance? • Quoting / Citing / Referencing as global scientific principle • „Reproducible research“ • Giving credits to authors / creators in distributed environments • Original location / context has to be known • Experienced in Grid-Environments [1]
Provenance & Interoperability • Re-Use / Sharing: “Addressing/Accessing” • Common view, common use • Unidirectional: No change of data objects! • Federation: “Discovering in Context” • Remote representation of distributed DOs • Aggregation: “Contextualizing” • Add unchanged object in a context • Processing/Annotation: “Changing” • Uni- vs. Bidirectional: Change of DOs and remote representation vs. back-storage (e.g. CVS)
IVOA • Astronomy area: Repositories use OAI-PMH to provide general • Provenance as kind of metadata • „Observation data model“ • History of data (process „lineage“) • Processing • Configuration: telescope, camera • Ambient condiditions: temperature etc. • Versioning is included (also algorithms etc.)
MetaFor • Data from numerical models • Descriptive information from model • Models are often transformed • Database / Registry for models in distributed repositories
D4Science • Framework for • More than simple import framework • Graphs representing provenance information • Thematic: fishing site / statistic /
DRIVER • Focus on document repositories • Some 100 … • Simple Provenance • OAI-PMH • Further (2nd order) Provenance • OAI-PMH („about“): repository identifiers • Enhanced Publications >> OAI-ORE • Semantic Model (named graphs) representing packages of documents and data objects
Solutions • Provenance • Registries for curator, publisher etc. • Resolving over registry • Diversity of approaches • CIDOC-CRM, OPM, EuroStats, • Languages: RDF / OAI-ORE
Differentiations • Expertise from Data-Centers as opposed to Data-Providers • Infrastructures should provide functions to add provenenace information (but do not) • e.g. EGEE provides an additional module for recording provenance data
Hot topics • Propagating provenance: versioning • Disambiguation / Deduplication • different identical objects • Who provides the data? • Each processing step should provide at least some metadata
Recommendations for Infrastructure • Standards for Provenance: Non-existing? • Each processing step should provide at least some metadata • Look deeper into specific implementations in subject communities • Technical point to point organisation • Bilateral • Programming a meeting • 24/25th ESA: earth science meeting?