1 / 38

Enhancing Scientific Data Management with Digital Provenance Metadata

Explore how Digital Provenance Metadata can enhance scientific data management by ensuring reliable acquisition, processing, and use/reuse of data, including capturing experimental setups, processing parameters, and linking data for future research.

yowell
Download Presentation

Enhancing Scientific Data Management with Digital Provenance Metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CRM Digital A Digital Provenance Ontology TPDL 2011 Martin Doerr Center for Cultural Informatics, Institute of Computer Science Foundation for Research and Technology - Hellas Berlin, Germany September 25, 2011

  2. Outline • Requirements • Competitors • CRM and Provenance • Data example • About Provenance-based reasoning • Conclusions

  3. Digital Provenance Metadata Requirements • Scientific data are empirical or synthetic. • Scientific data cannot be understood without knowledge about the meaning of the data and the ways and circumstancesof their creation • We use Metadata to assess • meaning (view, experimental setup, instrument settings), • relevance (depicted things, their status, their conditions), • quality (calibration, tolerances, errors, “artifacts”), • possibilities of Improvement and Reprocessing. • From generation to use, permanent storage, reuse (life-cycle)

  4. Requirements • Acquisition: Reliable registration of the process and context conditions • The experimental setup and environment (geometry, light sources, tools, obstacles, sources of noise/reflections etc.) • Capture device type,identity (individual behavior!) • Hierarchical model: Inherit metadata common to series of “shots” • The identity of the measured or depicted object • import identifiers, metadata • identity of location – GPS data import?

  5. Requirements • Processing: Reliable registration of parameters • Workflow logs, reliable identification of outputs with inputs • input files (URIs!) • output files (URIs!), formats, warning and error reports. • S/W identifiers and parameters, manual adjustments! • process types for reasoning • Reliable linking with captured data • Use and Reuse: parts, wholes and annotation: • Composition of final products, information packages (SIP, DIP, AIP) • composition of aggregates, selection of versions or parts for permanent storage, reuse or transfer between labs, to and from Digital Libraries. • Migration to other formats (compatibility and obsoletion) • Authenticity, rights

  6. Competitors • There is no provenance data standard format. • Too many application-oriented, partial, overspecialized solutions. • Several stand-alone models, overgeneralizations • No integrated ontologyof activitycontext • Competitors: • “Open Provenance Model”,”Provenance Vocabulary”,”Provenir”,”Premise” • no notion of acquisition (measurement, observation), place • Confuse agentive role with substance of actors, machines, S/W, context • No notion of temporal indeterminacy • W3C Provenance WG • precondition: No use of a larger reference ontology => a dogmatic reinvention of the wheel….”antimodularity of ontologies?”

  7. Competitors “Provenance Vocabulary”

  8. Competitors: “Provenir”

  9. CRM and Provenance • The Idea: • First conceived by Stephen Stead for CHI, San Francisco 2007 • Scientific data and metadata are historical records! • Scientific observation and machine-supported processing is initiated, on behalf of and controlled by human activity in physical space-time, not in cyber-space! • Things, data, people, times and places are causally related by events. • Other relations are either deductions from events or found by observation events. • CRM Digital: Specialize the CIDOC CRM (ISO21127)! • Will allow for rich integrated reasoning (Christ – Ascension – Ivory panel) • Innovations: • The Digital Measurement Event transfers from physical to digital world. • Machines “act” due to human initiative and responsibility. Humans use machines. No non-human actors!

  10. 3D Model Creation as Meetings t 3D model coherence volume of rendering coherence volume of mesh-creation mesh-data 2nd Computer scanner scan-data 1st Computer museum object operator coherence volume of acquisition S Museum It-Lab

  11. CRM Digital 2.5 • http://www.ics.forth.gr/isl/rdfs/3D-COFORM_CRMdig_v2.5.rdfs

  12. CRM Digital 2.5 : Digital Eventshttp://www.ics.forth.gr/isl/rdfs/3D-COFORM_CRMdig_v2.5.rdfs E7 Activity E65 Creation E11 Modification E16 Measurement D7 Digital Machine Event D10 Software Execution D12 Data Transfer Event D11 Digital Measurement Event D2 Digitization Process D3 Formal Derivation D27 Calibration Process

  13. E70 Thing E22 Man-Made Object E73 Information Object E84 Information Carrier D1 Digital Object E54 Dimension D13 Digital Information Carrier D8 Digital Device D9 Data Object D14 Software D35 Area CRM Digital 2.5: Digital Things

  14. CRM Digital 2.5: Digitization Digitization = feature transfer from physical to digital E16 Measurement E65 Creation E11 Modification P31 has modified (was modified by) P40 observed dimension (was observed in) P39 measured (was measured by) E24 Physical Man-Made Thing P94 has created (was created by) E1 CRM Entity D13 Digital Information Carrier E54 Dimension E28 Conceptual Object D11 Digital Measurement Event D2 Digitization Process L19 stores (is stored on) L1 digitized (was digitized by) L20 has created (was created by) D1 Digital Object E18 Physical Thing D9 Data Object

  15. CRM Digital 2.5: Software Execution Formal Derivation = feature transfer from digital to digital L11 had output (was output of) L10 had input (was input of) D7 Digital Machine Event D1 Digital Object D1 Digital Object L12 happened on device (was device for) L2 used as source (was source for) D10 Software Execution D1 Digital Object D8 Digital Device L18 has modified (was modified by) D13 Digital Information Carrier L13 used parameters (parameters for) D3 Formal Derivation D1 Digital Object L22 created derivative (was derivative created by) L21 used as derivation source (was derivation source for) P2 has type (is type of) D1 Digital Object D1 Digital Object E55 Type

  16. CRM Digital 2.5: Data Transfer Event Unreliable transfer L11 had output (was output of) L10 had input (was input of) D7 Digital Machine Event D1 Digital Object D1 Digital Object L18 has modified (was modified by) L14 transferred (was transferred by) D1 Digital Object D12 Data Transfer Event L12 happened on device (was device for) D13 Digital Information Carrier L16 has receiver (was receiver for) L15 has sender (was sender for) D8 Digital Device D8 Digital Device D8 Digital Device

  17. Applications • European IP CASPAR • European Space Agency: satellite data • IRCAM: Digital media performances • FORTH: Art Object Digitization • FORTH/ Metaware: Integrating Digital Rights with Provenance model. • European IP 3D-COFORM • 3D model acquisition by camera, manual or by camera array. Up to 20.000 files per object. • 3D model acquisition by laser scan. • Mesh processing, rendering • Synthetic models and scene compositions. • Provenance-based reasoning • Scalable repositories, representative amounts of data.

  18. 3D Acquisition Example: 3D Reconstruction from Photographs – The Gipsmuseum Campaign Sven Havemann, CGV, TU Graz June 30, 2009 worst case for metadata capture: a complex manual process

  19. Acquisition Workflow Hierarchy D2 Digitization Processinstantiation example Data Acquisition Event DAE1 has part has part has part Object Acquisition Event OAE1 Calibration Event CE1 Digital Documentation Event DDE1 Object Acquisition Event OAE1 has part has part has part Calibration Event CE2 Sequence Event SE1 Digital Documentation Event DDE2 Sequence Event SE1 Sequence Event SE1 has part Calibration Event CE3 Capturing Event CapE1 Capturing Event CapE1 Capturing Event CapE1 Capturing Event CapE1 Capturing Event CapE1

  20. Modelling the Acquisition Process (AP) • Register: • Who, when, where. • equipment identifiers, equipment models, firmware • Setup geometry and conditions. • Assumptions: worst case, a completely manual process! • Set of objects captured under common conditions. • Each object captured by a sequence of “shots” • Metadata are stored by “historical order” (like workflow logs) • step-by-step as executed, not as planned! • concatenated by referring to identifiers of previously existing or created entities and initialized events. • = robust against exceptions in the planned workflow • Avoid redundancy of information • Hold common information as high as possible in a hierarchy of nested activities

  21. 3D Acquisition • Example: • The Kazafani Boat • Found in 1963, during a salvage excavation in the now Turkish occupied part of Cyprus (inaccessible and destroyed site). • Tomb from the 12th century B.C. • Unique object, hand made pottery • 40x20.5x23 cm – canoe boat shape • Permanently exhibited at the Nicosia Museum Workflow 3D scanning – NextEngine 3D model creation – Meshlab Rapid prototyping Testing glue, stabilizers, colours Print final replica Colour final replica 21

  22. Data Acquisition Event - Schema Persons (“operators”) • Person: uuid:aeac5200-0138-11e0-a976-0800200c9a66(E21 Person) P131 is identified by : D21 Person Name • L51 has first name:Martin (Literal E62 String) • L52 has last name:Doerr(Literal E62 String) • P107 is current or former member: http://www.ics.forth.gr/ (E40 Legal Body) • L62 in the role of:http://www.3d-coform.eu/RoleType/researcher (E55 Type)

  23. Data Acquisition Event - Schema Legal Bodies & Places • Legal Body: http://starc.cyi.ac.cy/ (E40 Legal Body) • L4 has preferred label: STARC-The Cyprus Institute, Nicosia, Cyprus(Literal E62 String) • no address • P74 has current or former residence: http://www.geonames.org/146268/(E53 Place) • L4 has preferred label: Nicosia(Literal E62 String) • P3 has note:Cyprus (Literal E62 String) • exact address and the city where it is located • P74 has current or former residence: uuid: dbae7cd0-e371-11e0-9572-0800200c9a66(E53 Place) • L4 has preferred label: 15 Kypranoros Street(Literal E62 String) • P89 falls within:http://www.geonames.org/146268/(E53 Place) • P3 has note:15 Kypranoros Street, Nicosia 1061, Cyprus (Literal E62 String) • just the address without details for city • P74 has current or former residence: uuid: dbae7cd0-e371-11e0-9572-0800200c9a66(E53 Place) • L4 has preferred label: 15 Kypranoros Street(Literal E62 String) • P3 has note:15 Kypranoros Street, Nicosia 1061, Cyprus (Literal E62 String)

  24. Data Acquisition Event • Data Acquisition Event: uuid:354c91e0-b3fa-11de-98c6-0002a5d5c30a (D2 Digitization Process) • L4 has preferred label: 2010 Laser scanning in Arch. Museum of Nicosia (Literal E62 String) • P2 has type:http://www.3d-coform.eu/EventType/laser_scanning (E55 Type) • P2 has type:http://www.3d-coform.eu/EventType/data_acquisition (E55 Type) • P3 has note: “evening sun shines through the west window” (Literal E62 String) • SUPER-EVENTS: • P9 forms part of: uuid:07f05f40-b415-11de-9d48-0002a5d5c30c (E7 Activity) (Project) • WHEN: • L31 has starting date-time: 2010-05-28T08:00:00Z(xs:dateTime E61 Time Primitive) • L32 has ending date-time: 2010-06-02T18:00:00Z (xs:dateTime E61 Time Primitive) WHERE: • P7 took place at: http://starc.cyi.ac.cy/#Place/ArchMuseumNicosia/ConservationLab (E53 Place) WHO: • L29 has responsible organisation:http://starc.cyi.ac.cy/ (E40 Legal Body) • L30 has operator: uuid:aeac5200-0138-11e0-a976-0800200c9a66 (E21 Person)

  25. Data Acquisition Event • Data Acquisition Event: uuid:354c91e0-b3fa-11de-98c6-0002a5d5c30a (D2 Digitization Process) • WITH WHAT (camera): • L12 happened on device: http://www.nextengine.com/.../E4035623490 (D8 Digital Device) • L59 has serial number:E4035623490 (Literal E62 String) • L4 has preferred label:“Next Engine Desktop 3D scanner” (Literal E62 String)(=Model) • P2 has type: http://www.3d-coform.eu/DeviceType/laser_scanner (E55 Type) • L33 has maker:http://www.nextengine.com/ (E39 Actor) • P3 has note:Next Engine Desktop 3D scanner, Multi stripe laser (Literal E62 String) • L23 used software or firmware:http://www.nextengine.com/.../Scan_Studio (D14 Software) • WITH WHAT (additional devices): • P16 used specific object: http://b2b.sony.com/.../SONVPLFE4035623490 (E22 Man Made Object) • L59 has serial number:32526158 (Literal E62 String) • L4 has preferred label:SONY PLFE 40 Projector (Literal E62 String)(= Model) • P2 has type: http://www.getty.edu/research/tools/vocabularies/aat/300022665 (E55 Type) • L33 has maker:http://www.nikon.com/ (E39 Actor) • P16 used specific object: http://www.cgv.tugraz.at/structure_slide_T45a(E22 Man Made Object)

  26. Object Acquisition Event • Object Acquisition Event: uuid:07f05f40-b415-11de-9d48-0002a5d5c30b (D2 Digitization Process) • L4 has preferred label: 2010 Laser scanning of Kazafani Boat 249.377 in • Archaeological Museum of Nicosia(Literal E62 String) • P2 has type:http://www.3d-coform.eu/EventType/object_acquisition (E55 Type) • L10 had input:uuid:3d066a90-9cb1-11e0-aa82-0800200c9a66 (D9 Data Object) (calibration file) • L10 had input:uuid:46963500-9cce-11e0-aa82-0800200c9a66 (D9 Data Object) (configuration file) SUPER-EVENTS: • P9 forms part of: uuid:354c91e0-b3fa-11de-98c6-0002a5d5c30a(D2 Digitization Process) (Data Acquisition Event)

  27. Object Acquisition Event • Object Acquisition Event: uuid:07f05f40-b415-11de-9d48-0002a5d5c30b (D2 Digitization Process) • WHAT (acquired object): L1 digitized: uuid:e4761f00-0ce7-11e0-81e0-0800200c9a66(E22 Man-Made Object) P1 is identified by: http://www.mcw.gov.cy/mcw/DA/DA.nsf/Objects/249.377(E42 Identifier) (all “known” URIs) L4 has preferred label: Kazafani Boat, vase, 249.377(Literal  E62 String) L53 is not uniquely identified by: Kazafani Boat(Literal  E62 String) L53 is not uniquely identified by: Bronze Age model of a boat (Literal  E62 String) L55 has inventory no: 249.377 (Literal  E62 String) P2 has type: http://www.getty.edu/research/tools/vocabularies/aat/300132254(E55 Type)(vase) P3 has note: “Deep hollow hull with in-curving flat-topped gunwale ….” (Literal E62 String) P50 has current keeper: uuid:6f2972e6-ad9e-4a72-930d-263f01e75d8c(E40 Legal Body) (Archaeological Museum of Nicosia)

  28. Calibration Event • Calibration Event: uuid:07f05f40-b415-11de-9d48-0002a5d5c30c(D2 Digitization Process) • P2 has type: http://www.3d-coform.eu/EventType/calibration(E55 Type) • L1 digitized: http://cg.cs.uni-bonn.de/#Calibration/Bariumsulfate/Block10(E18 Physical Thing) • L4 has preferred label:block of bariumsulfate (10x10x1cm) (Literal E62 String) • P2 has type: http://www.3d-coform.eu/InformationObjectType/MultiviewdomeCalibrationData • (E55 Type) (color chart, ruler, greyscale) • WHEN: • L31 has starting date-time: 2009-07-02T16:04:34Z(xs:dateTime E61 Time Primitive) • L32 has ending date-time: 2009-07-02T16:04:34Z (xs:dateTime E61 Time Primitive) OUTPUT: • L20 has created: uuid:07f05f40-b415-11de-9d48-0002a5d5c31c(D9 Data Object)

  29. Capturing Event • Capturing Event: uuid:07f05f40-b415-11de-9d48-0002a5d5c30n (D2 Digitization Process) • L4 has preferred label:Capture 1_0 for Boat(Literal E62 String) • P2 has type: http://www.3d-coform.eu/EventType/capture(E55 Type) • SUPER-EVENTS: • P9 forms part of: uuid:07f05f40-b415-11de-9d48-0002a5d5c30b (D2 Digitization Process) • (Object Acquisition Event) • WHEN: • L31 has starting date-time: 2009-07-02T16:07:54Z(xs:dateTime E61 Time Primitive) • L32 has ending date-time: 2009-07-02T16:07:54Z (xs:dateTime E61 Time Primitive)

  30. Capturing Event • Capturing Event: uuid:07f05f40-b415-11de-9d48-0002a5d5c30n (D2 Digitization Process) (cont’d) OUTPUT: L20 has created: uuid:07f05f40-b415-11de-9d48-0002a5d5c31g(D9 Data Object) (image file, zip file ...) L4 has preferred label: 1_0.ply (Literal E62 String) P2 has type:http://www.3d-coform.eu/ObjectType/mesh (E55 Type) P2 has type:http://www.3d-coform.eu/MimeType/ply (E55 Type) P43 has dimension: (E54 Dimension) P2 has type:http://www.3d-coform.eu/DimensionType/mesh_vertices (E55 Type) P90 has value:394302 (xs:integer E60 Number) P91 has unit: http://www.3d-coform.eu/UnitType/vertices (E58 Measurement Unit) P43 has dimension: (E54 Dimension) P2 has type:http://www.3d-coform.eu/DimensionType/mesh_faces (E55 Type) P90 has value:782543 (xs:integer E60 Number) P91 has unit: http://www.3d-coform.eu/UnitType/faces (E58 Measurement Unit)

  31. ARC 3Dcomponent Used images ++ Depth map for each used image+ metadata Images ARC 3D web service component • Perform the 3D reconstruction of an artefact from images retrieved from the RI • For an input sequence of images, ARC 3D produces a calibration matrix and a depth map for each image identified as usable for the reconstruction. • This output data is then ingested into the RI so that it can be retrieved and loaded into MeshLab to perform the final reconstruction step (integration of the depth maps).

  32. ARC 3D Process Event • Process Event: uuid:2f7d22db-1d89-11e0-ac64-0800200c9a66 (D3 Formal Derivation) • L4 has preferred label:Processing of Ivory Panel raw data with Arc3D(Literal E62 String) • P2 has type: http://www.3d-coform.eu/EventType/process_event(E55 Type) • P2 has type: http://www.3d-coform.eu/EventType/modeling_process(E55 Type) WHO: • L29 has responsible organisation:http://www.vam.ac.uk/(E40 Legal Body) • L30 has operator: uuid:2f7d22d3-1d89-11e0-ac64-0800200c9a66 (E21 Person) • WHEN: • L31 has starting date-time: 2010-08-07T08:00:00Z(xs:dateTime E61 Time Primitive) • L32 has ending date-time: 2010-12-02T10:00:00Z(xs:dateTime E61 Time Primitive) WHERE: • P7 took place at: http://www.vam.ac.uk/#Place/ConservationLaboratory(D23 Room)

  33. ARC 3D Process Event Process Event: uuid:354c91e0-b3fa-11de-98c6-0002a5d5c50c (D3 Formal Derivation) (cont’d) WITH WHAT (Software): L2 used as source: http://www.esat.kuleuven.be/psi/visics/ARC3D(D14 Software) L4 has preferred label:ARC3D (Literal E62 String) P2 has type: http://www.esat.kuleuven.be/psi/visics/ARC3D/Version_1.0.0(E55 Type) P2 has type:http://www.3d-coform.eu/SoftwareType/processing_software(E55 Type) L33hasmaker:http://www.esat.kuleuven.be/psi/visics(E39Actor) L4 has preferred label: KULeuven PSIVISICS(Literal E62 String)

  34. ARC 3D Process Event Process Event: uuid:354c91e0-b3fa-11de-98c6-0002a5d5c50c (D3 Formal Derivation) (cont’d) WHAT (Input): • L21 used as derivation source:uuid:2f7d22d2-1d89-11e0-ac64-0800200c9a66 (D9 Data Object) • L4 has preferred label: A.15-1955-dome-out.zip(Literal E62 String) WHAT (Derivative output): L22 created derivative: uuid:2f7d22dc-1d89-11e0-ac64-0800200c9a66 (D9 Data Object) L4 has preferred label: Arc3D-A.15-1955_dmy.v3d(Literal E62 String) P2 has type: http://www.3d-coform.eu/#ObjectType/mesh(E55 Type) P2 has type: http://www.3d-coform.eu/#mimetype/v3d(E55 Type) (calibration files, depth map files, CUN file and respective images)

  35. raw data objects raw data objects Acquistion metadata subevents raw data objects Acquistion metadata subevents raw data objects Acquistion metadata subevents raw data objects Acquistion metadata subevents Acquistion metadata subevents raw data objects Acquistion metadata subevents Reasoning: A Coherent Semantic Net software, algorithms software, algorithms devices, device models Processing metadata who, when, how, what , using what params 2nd Acquistion who, when, where, what , using what models Processing metadata who, when, how, what , using what object metadata features, history Processing metadata who, when, how, what , using what Acquistion metadata who, when, where, what , using what raw data objects models params meshs meshs meshs params meshs …+ who - when - where

  36. Good Reasons for Reasoning (3D-COFORM) • The integrated semantic network of provenance metadata allows for supporting data consistency, interpretation, reuse, preservation • Management: • garbage collection of all reproducible intermediate results (classify software!) • export packages: collect all acquisition data and parameters used for one model. • Preservation: • monitor obsoletion of all processing tools, format viewers necessary to interprete or reprocess certain data. • Property propagation to subevents and derivatives (economy & consistency): • For instance, “Which object represents my mesh?” result in long query paths: “Jesus Christ”. forms part of “Ascension” is carried by: “Ivory Panel”. was digitized by: “MiniDomeEvent2011934”.has created: “A.15-1955model v1.zip”.used as derivation source:…….

  37. Legend Digitization_Process Formal_Derivation Sub-events Data_Object Man_Made_Object 3D-COFORM: Concatenated Metadata A.15-1955_nxtng_5_degrees_ complete_bjbrown.ply A.15-1955 NXTENG whole model v1.zip created_derivative has_created created_derivative A.15-1955 Degree scans.zip 7Ivory_NE_MeshLabProcEvent.rdf A.15-1955 NXTENG whole model v1.rdf forms_part_of 9Ivory_NE_MeshLabProcEvent.rdf 4IvoryPanel_NE_DetSeqEvent.rdf 6Ivory_NE_MeshLabProcEvent.rdf used_as_derivation_source A.15-1955 corner scans.rdf A.15-1955 Master.zip forms_part_of used_as_derivation_source forms_part_of A15-1955 Retouched.zip created_derivative has_created used_as_derivation_source 3IvoryPanel_NE_ObjAcqEvent.rdf A.15-1955 corner scans.zip created_derivative 8Ivory_NE_MeshLabProcEvent.rdf digitized used_as_derivation_source 3IvoPan_LegacyData.rdf used_as_derivation_source A.15-1955-dome-out.zip 2009CA5307v Coloured.ply 4Ivory_Arc3DProcEvent.rdf Arc3D-A.15-1955_dmy.v3d 5Ivory_MeshLabProcEvent.rdf created_derivative has_created digitized created_derivative A.15-1955-dome-out.rdf forms_part_of has_created 2009CR4851_0.rdf 1IvoryPanel_ObjAcqEvent.rdf … 2009CR4851_0.tif forms_part_of 2009CA5306_0.rdf … 2IvoryPanel_DocEvent.rdf forms_part_of forms_part_of has_created 2009CA5306_0.tif

  38. Conclusions • CRMdig provides a good high-level model for empirical provenance of digital data, open for further specialization, integrated with arbitrary context representations • CRM-CRMDig outperforms competitors in expressive power and integration potential. • Future work: Theory of property propagation to subevents and derivatives • Needed: Theories of feature conservation by kinds of derivation processes. • Links: http://www.ics.forth.gr/isl/rdfs/3D-COFORM_CRMdig_v2.5.rdfs

More Related