380 likes | 394 Views
Explore how Digital Provenance Metadata can enhance scientific data management by ensuring reliable acquisition, processing, and use/reuse of data, including capturing experimental setups, processing parameters, and linking data for future research.
E N D
CRM Digital A Digital Provenance Ontology TPDL 2011 Martin Doerr Center for Cultural Informatics, Institute of Computer Science Foundation for Research and Technology - Hellas Berlin, Germany September 25, 2011
Outline • Requirements • Competitors • CRM and Provenance • Data example • About Provenance-based reasoning • Conclusions
Digital Provenance Metadata Requirements • Scientific data are empirical or synthetic. • Scientific data cannot be understood without knowledge about the meaning of the data and the ways and circumstancesof their creation • We use Metadata to assess • meaning (view, experimental setup, instrument settings), • relevance (depicted things, their status, their conditions), • quality (calibration, tolerances, errors, “artifacts”), • possibilities of Improvement and Reprocessing. • From generation to use, permanent storage, reuse (life-cycle)
Requirements • Acquisition: Reliable registration of the process and context conditions • The experimental setup and environment (geometry, light sources, tools, obstacles, sources of noise/reflections etc.) • Capture device type,identity (individual behavior!) • Hierarchical model: Inherit metadata common to series of “shots” • The identity of the measured or depicted object • import identifiers, metadata • identity of location – GPS data import?
Requirements • Processing: Reliable registration of parameters • Workflow logs, reliable identification of outputs with inputs • input files (URIs!) • output files (URIs!), formats, warning and error reports. • S/W identifiers and parameters, manual adjustments! • process types for reasoning • Reliable linking with captured data • Use and Reuse: parts, wholes and annotation: • Composition of final products, information packages (SIP, DIP, AIP) • composition of aggregates, selection of versions or parts for permanent storage, reuse or transfer between labs, to and from Digital Libraries. • Migration to other formats (compatibility and obsoletion) • Authenticity, rights
Competitors • There is no provenance data standard format. • Too many application-oriented, partial, overspecialized solutions. • Several stand-alone models, overgeneralizations • No integrated ontologyof activitycontext • Competitors: • “Open Provenance Model”,”Provenance Vocabulary”,”Provenir”,”Premise” • no notion of acquisition (measurement, observation), place • Confuse agentive role with substance of actors, machines, S/W, context • No notion of temporal indeterminacy • W3C Provenance WG • precondition: No use of a larger reference ontology => a dogmatic reinvention of the wheel….”antimodularity of ontologies?”
CRM and Provenance • The Idea: • First conceived by Stephen Stead for CHI, San Francisco 2007 • Scientific data and metadata are historical records! • Scientific observation and machine-supported processing is initiated, on behalf of and controlled by human activity in physical space-time, not in cyber-space! • Things, data, people, times and places are causally related by events. • Other relations are either deductions from events or found by observation events. • CRM Digital: Specialize the CIDOC CRM (ISO21127)! • Will allow for rich integrated reasoning (Christ – Ascension – Ivory panel) • Innovations: • The Digital Measurement Event transfers from physical to digital world. • Machines “act” due to human initiative and responsibility. Humans use machines. No non-human actors!
3D Model Creation as Meetings t 3D model coherence volume of rendering coherence volume of mesh-creation mesh-data 2nd Computer scanner scan-data 1st Computer museum object operator coherence volume of acquisition S Museum It-Lab
CRM Digital 2.5 • http://www.ics.forth.gr/isl/rdfs/3D-COFORM_CRMdig_v2.5.rdfs
CRM Digital 2.5 : Digital Eventshttp://www.ics.forth.gr/isl/rdfs/3D-COFORM_CRMdig_v2.5.rdfs E7 Activity E65 Creation E11 Modification E16 Measurement D7 Digital Machine Event D10 Software Execution D12 Data Transfer Event D11 Digital Measurement Event D2 Digitization Process D3 Formal Derivation D27 Calibration Process
E70 Thing E22 Man-Made Object E73 Information Object E84 Information Carrier D1 Digital Object E54 Dimension D13 Digital Information Carrier D8 Digital Device D9 Data Object D14 Software D35 Area CRM Digital 2.5: Digital Things
CRM Digital 2.5: Digitization Digitization = feature transfer from physical to digital E16 Measurement E65 Creation E11 Modification P31 has modified (was modified by) P40 observed dimension (was observed in) P39 measured (was measured by) E24 Physical Man-Made Thing P94 has created (was created by) E1 CRM Entity D13 Digital Information Carrier E54 Dimension E28 Conceptual Object D11 Digital Measurement Event D2 Digitization Process L19 stores (is stored on) L1 digitized (was digitized by) L20 has created (was created by) D1 Digital Object E18 Physical Thing D9 Data Object
CRM Digital 2.5: Software Execution Formal Derivation = feature transfer from digital to digital L11 had output (was output of) L10 had input (was input of) D7 Digital Machine Event D1 Digital Object D1 Digital Object L12 happened on device (was device for) L2 used as source (was source for) D10 Software Execution D1 Digital Object D8 Digital Device L18 has modified (was modified by) D13 Digital Information Carrier L13 used parameters (parameters for) D3 Formal Derivation D1 Digital Object L22 created derivative (was derivative created by) L21 used as derivation source (was derivation source for) P2 has type (is type of) D1 Digital Object D1 Digital Object E55 Type
CRM Digital 2.5: Data Transfer Event Unreliable transfer L11 had output (was output of) L10 had input (was input of) D7 Digital Machine Event D1 Digital Object D1 Digital Object L18 has modified (was modified by) L14 transferred (was transferred by) D1 Digital Object D12 Data Transfer Event L12 happened on device (was device for) D13 Digital Information Carrier L16 has receiver (was receiver for) L15 has sender (was sender for) D8 Digital Device D8 Digital Device D8 Digital Device
Applications • European IP CASPAR • European Space Agency: satellite data • IRCAM: Digital media performances • FORTH: Art Object Digitization • FORTH/ Metaware: Integrating Digital Rights with Provenance model. • European IP 3D-COFORM • 3D model acquisition by camera, manual or by camera array. Up to 20.000 files per object. • 3D model acquisition by laser scan. • Mesh processing, rendering • Synthetic models and scene compositions. • Provenance-based reasoning • Scalable repositories, representative amounts of data.
3D Acquisition Example: 3D Reconstruction from Photographs – The Gipsmuseum Campaign Sven Havemann, CGV, TU Graz June 30, 2009 worst case for metadata capture: a complex manual process
Acquisition Workflow Hierarchy D2 Digitization Processinstantiation example Data Acquisition Event DAE1 has part has part has part Object Acquisition Event OAE1 Calibration Event CE1 Digital Documentation Event DDE1 Object Acquisition Event OAE1 has part has part has part Calibration Event CE2 Sequence Event SE1 Digital Documentation Event DDE2 Sequence Event SE1 Sequence Event SE1 has part Calibration Event CE3 Capturing Event CapE1 Capturing Event CapE1 Capturing Event CapE1 Capturing Event CapE1 Capturing Event CapE1
Modelling the Acquisition Process (AP) • Register: • Who, when, where. • equipment identifiers, equipment models, firmware • Setup geometry and conditions. • Assumptions: worst case, a completely manual process! • Set of objects captured under common conditions. • Each object captured by a sequence of “shots” • Metadata are stored by “historical order” (like workflow logs) • step-by-step as executed, not as planned! • concatenated by referring to identifiers of previously existing or created entities and initialized events. • = robust against exceptions in the planned workflow • Avoid redundancy of information • Hold common information as high as possible in a hierarchy of nested activities
3D Acquisition • Example: • The Kazafani Boat • Found in 1963, during a salvage excavation in the now Turkish occupied part of Cyprus (inaccessible and destroyed site). • Tomb from the 12th century B.C. • Unique object, hand made pottery • 40x20.5x23 cm – canoe boat shape • Permanently exhibited at the Nicosia Museum Workflow 3D scanning – NextEngine 3D model creation – Meshlab Rapid prototyping Testing glue, stabilizers, colours Print final replica Colour final replica 21
Data Acquisition Event - Schema Persons (“operators”) • Person: uuid:aeac5200-0138-11e0-a976-0800200c9a66(E21 Person) P131 is identified by : D21 Person Name • L51 has first name:Martin (Literal E62 String) • L52 has last name:Doerr(Literal E62 String) • P107 is current or former member: http://www.ics.forth.gr/ (E40 Legal Body) • L62 in the role of:http://www.3d-coform.eu/RoleType/researcher (E55 Type)
Data Acquisition Event - Schema Legal Bodies & Places • Legal Body: http://starc.cyi.ac.cy/ (E40 Legal Body) • L4 has preferred label: STARC-The Cyprus Institute, Nicosia, Cyprus(Literal E62 String) • no address • P74 has current or former residence: http://www.geonames.org/146268/(E53 Place) • L4 has preferred label: Nicosia(Literal E62 String) • P3 has note:Cyprus (Literal E62 String) • exact address and the city where it is located • P74 has current or former residence: uuid: dbae7cd0-e371-11e0-9572-0800200c9a66(E53 Place) • L4 has preferred label: 15 Kypranoros Street(Literal E62 String) • P89 falls within:http://www.geonames.org/146268/(E53 Place) • P3 has note:15 Kypranoros Street, Nicosia 1061, Cyprus (Literal E62 String) • just the address without details for city • P74 has current or former residence: uuid: dbae7cd0-e371-11e0-9572-0800200c9a66(E53 Place) • L4 has preferred label: 15 Kypranoros Street(Literal E62 String) • P3 has note:15 Kypranoros Street, Nicosia 1061, Cyprus (Literal E62 String)
Data Acquisition Event • Data Acquisition Event: uuid:354c91e0-b3fa-11de-98c6-0002a5d5c30a (D2 Digitization Process) • L4 has preferred label: 2010 Laser scanning in Arch. Museum of Nicosia (Literal E62 String) • P2 has type:http://www.3d-coform.eu/EventType/laser_scanning (E55 Type) • P2 has type:http://www.3d-coform.eu/EventType/data_acquisition (E55 Type) • P3 has note: “evening sun shines through the west window” (Literal E62 String) • SUPER-EVENTS: • P9 forms part of: uuid:07f05f40-b415-11de-9d48-0002a5d5c30c (E7 Activity) (Project) • WHEN: • L31 has starting date-time: 2010-05-28T08:00:00Z(xs:dateTime E61 Time Primitive) • L32 has ending date-time: 2010-06-02T18:00:00Z (xs:dateTime E61 Time Primitive) WHERE: • P7 took place at: http://starc.cyi.ac.cy/#Place/ArchMuseumNicosia/ConservationLab (E53 Place) WHO: • L29 has responsible organisation:http://starc.cyi.ac.cy/ (E40 Legal Body) • L30 has operator: uuid:aeac5200-0138-11e0-a976-0800200c9a66 (E21 Person)
Data Acquisition Event • Data Acquisition Event: uuid:354c91e0-b3fa-11de-98c6-0002a5d5c30a (D2 Digitization Process) • WITH WHAT (camera): • L12 happened on device: http://www.nextengine.com/.../E4035623490 (D8 Digital Device) • L59 has serial number:E4035623490 (Literal E62 String) • L4 has preferred label:“Next Engine Desktop 3D scanner” (Literal E62 String)(=Model) • P2 has type: http://www.3d-coform.eu/DeviceType/laser_scanner (E55 Type) • L33 has maker:http://www.nextengine.com/ (E39 Actor) • P3 has note:Next Engine Desktop 3D scanner, Multi stripe laser (Literal E62 String) • L23 used software or firmware:http://www.nextengine.com/.../Scan_Studio (D14 Software) • WITH WHAT (additional devices): • P16 used specific object: http://b2b.sony.com/.../SONVPLFE4035623490 (E22 Man Made Object) • L59 has serial number:32526158 (Literal E62 String) • L4 has preferred label:SONY PLFE 40 Projector (Literal E62 String)(= Model) • P2 has type: http://www.getty.edu/research/tools/vocabularies/aat/300022665 (E55 Type) • L33 has maker:http://www.nikon.com/ (E39 Actor) • P16 used specific object: http://www.cgv.tugraz.at/structure_slide_T45a(E22 Man Made Object)
Object Acquisition Event • Object Acquisition Event: uuid:07f05f40-b415-11de-9d48-0002a5d5c30b (D2 Digitization Process) • L4 has preferred label: 2010 Laser scanning of Kazafani Boat 249.377 in • Archaeological Museum of Nicosia(Literal E62 String) • P2 has type:http://www.3d-coform.eu/EventType/object_acquisition (E55 Type) • L10 had input:uuid:3d066a90-9cb1-11e0-aa82-0800200c9a66 (D9 Data Object) (calibration file) • L10 had input:uuid:46963500-9cce-11e0-aa82-0800200c9a66 (D9 Data Object) (configuration file) SUPER-EVENTS: • P9 forms part of: uuid:354c91e0-b3fa-11de-98c6-0002a5d5c30a(D2 Digitization Process) (Data Acquisition Event)
Object Acquisition Event • Object Acquisition Event: uuid:07f05f40-b415-11de-9d48-0002a5d5c30b (D2 Digitization Process) • WHAT (acquired object): L1 digitized: uuid:e4761f00-0ce7-11e0-81e0-0800200c9a66(E22 Man-Made Object) P1 is identified by: http://www.mcw.gov.cy/mcw/DA/DA.nsf/Objects/249.377(E42 Identifier) (all “known” URIs) L4 has preferred label: Kazafani Boat, vase, 249.377(Literal E62 String) L53 is not uniquely identified by: Kazafani Boat(Literal E62 String) L53 is not uniquely identified by: Bronze Age model of a boat (Literal E62 String) L55 has inventory no: 249.377 (Literal E62 String) P2 has type: http://www.getty.edu/research/tools/vocabularies/aat/300132254(E55 Type)(vase) P3 has note: “Deep hollow hull with in-curving flat-topped gunwale ….” (Literal E62 String) P50 has current keeper: uuid:6f2972e6-ad9e-4a72-930d-263f01e75d8c(E40 Legal Body) (Archaeological Museum of Nicosia)
Calibration Event • Calibration Event: uuid:07f05f40-b415-11de-9d48-0002a5d5c30c(D2 Digitization Process) • P2 has type: http://www.3d-coform.eu/EventType/calibration(E55 Type) • L1 digitized: http://cg.cs.uni-bonn.de/#Calibration/Bariumsulfate/Block10(E18 Physical Thing) • L4 has preferred label:block of bariumsulfate (10x10x1cm) (Literal E62 String) • P2 has type: http://www.3d-coform.eu/InformationObjectType/MultiviewdomeCalibrationData • (E55 Type) (color chart, ruler, greyscale) • WHEN: • L31 has starting date-time: 2009-07-02T16:04:34Z(xs:dateTime E61 Time Primitive) • L32 has ending date-time: 2009-07-02T16:04:34Z (xs:dateTime E61 Time Primitive) OUTPUT: • L20 has created: uuid:07f05f40-b415-11de-9d48-0002a5d5c31c(D9 Data Object)
Capturing Event • Capturing Event: uuid:07f05f40-b415-11de-9d48-0002a5d5c30n (D2 Digitization Process) • L4 has preferred label:Capture 1_0 for Boat(Literal E62 String) • P2 has type: http://www.3d-coform.eu/EventType/capture(E55 Type) • SUPER-EVENTS: • P9 forms part of: uuid:07f05f40-b415-11de-9d48-0002a5d5c30b (D2 Digitization Process) • (Object Acquisition Event) • WHEN: • L31 has starting date-time: 2009-07-02T16:07:54Z(xs:dateTime E61 Time Primitive) • L32 has ending date-time: 2009-07-02T16:07:54Z (xs:dateTime E61 Time Primitive)
Capturing Event • Capturing Event: uuid:07f05f40-b415-11de-9d48-0002a5d5c30n (D2 Digitization Process) (cont’d) OUTPUT: L20 has created: uuid:07f05f40-b415-11de-9d48-0002a5d5c31g(D9 Data Object) (image file, zip file ...) L4 has preferred label: 1_0.ply (Literal E62 String) P2 has type:http://www.3d-coform.eu/ObjectType/mesh (E55 Type) P2 has type:http://www.3d-coform.eu/MimeType/ply (E55 Type) P43 has dimension: (E54 Dimension) P2 has type:http://www.3d-coform.eu/DimensionType/mesh_vertices (E55 Type) P90 has value:394302 (xs:integer E60 Number) P91 has unit: http://www.3d-coform.eu/UnitType/vertices (E58 Measurement Unit) P43 has dimension: (E54 Dimension) P2 has type:http://www.3d-coform.eu/DimensionType/mesh_faces (E55 Type) P90 has value:782543 (xs:integer E60 Number) P91 has unit: http://www.3d-coform.eu/UnitType/faces (E58 Measurement Unit)
ARC 3Dcomponent Used images ++ Depth map for each used image+ metadata Images ARC 3D web service component • Perform the 3D reconstruction of an artefact from images retrieved from the RI • For an input sequence of images, ARC 3D produces a calibration matrix and a depth map for each image identified as usable for the reconstruction. • This output data is then ingested into the RI so that it can be retrieved and loaded into MeshLab to perform the final reconstruction step (integration of the depth maps).
ARC 3D Process Event • Process Event: uuid:2f7d22db-1d89-11e0-ac64-0800200c9a66 (D3 Formal Derivation) • L4 has preferred label:Processing of Ivory Panel raw data with Arc3D(Literal E62 String) • P2 has type: http://www.3d-coform.eu/EventType/process_event(E55 Type) • P2 has type: http://www.3d-coform.eu/EventType/modeling_process(E55 Type) WHO: • L29 has responsible organisation:http://www.vam.ac.uk/(E40 Legal Body) • L30 has operator: uuid:2f7d22d3-1d89-11e0-ac64-0800200c9a66 (E21 Person) • WHEN: • L31 has starting date-time: 2010-08-07T08:00:00Z(xs:dateTime E61 Time Primitive) • L32 has ending date-time: 2010-12-02T10:00:00Z(xs:dateTime E61 Time Primitive) WHERE: • P7 took place at: http://www.vam.ac.uk/#Place/ConservationLaboratory(D23 Room)
ARC 3D Process Event Process Event: uuid:354c91e0-b3fa-11de-98c6-0002a5d5c50c (D3 Formal Derivation) (cont’d) WITH WHAT (Software): L2 used as source: http://www.esat.kuleuven.be/psi/visics/ARC3D(D14 Software) L4 has preferred label:ARC3D (Literal E62 String) P2 has type: http://www.esat.kuleuven.be/psi/visics/ARC3D/Version_1.0.0(E55 Type) P2 has type:http://www.3d-coform.eu/SoftwareType/processing_software(E55 Type) L33hasmaker:http://www.esat.kuleuven.be/psi/visics(E39Actor) L4 has preferred label: KULeuven PSIVISICS(Literal E62 String)
ARC 3D Process Event Process Event: uuid:354c91e0-b3fa-11de-98c6-0002a5d5c50c (D3 Formal Derivation) (cont’d) WHAT (Input): • L21 used as derivation source:uuid:2f7d22d2-1d89-11e0-ac64-0800200c9a66 (D9 Data Object) • L4 has preferred label: A.15-1955-dome-out.zip(Literal E62 String) WHAT (Derivative output): L22 created derivative: uuid:2f7d22dc-1d89-11e0-ac64-0800200c9a66 (D9 Data Object) L4 has preferred label: Arc3D-A.15-1955_dmy.v3d(Literal E62 String) P2 has type: http://www.3d-coform.eu/#ObjectType/mesh(E55 Type) P2 has type: http://www.3d-coform.eu/#mimetype/v3d(E55 Type) (calibration files, depth map files, CUN file and respective images)
raw data objects raw data objects Acquistion metadata subevents raw data objects Acquistion metadata subevents raw data objects Acquistion metadata subevents raw data objects Acquistion metadata subevents Acquistion metadata subevents raw data objects Acquistion metadata subevents Reasoning: A Coherent Semantic Net software, algorithms software, algorithms devices, device models Processing metadata who, when, how, what , using what params 2nd Acquistion who, when, where, what , using what models Processing metadata who, when, how, what , using what object metadata features, history Processing metadata who, when, how, what , using what Acquistion metadata who, when, where, what , using what raw data objects models params meshs meshs meshs params meshs …+ who - when - where
Good Reasons for Reasoning (3D-COFORM) • The integrated semantic network of provenance metadata allows for supporting data consistency, interpretation, reuse, preservation • Management: • garbage collection of all reproducible intermediate results (classify software!) • export packages: collect all acquisition data and parameters used for one model. • Preservation: • monitor obsoletion of all processing tools, format viewers necessary to interprete or reprocess certain data. • Property propagation to subevents and derivatives (economy & consistency): • For instance, “Which object represents my mesh?” result in long query paths: “Jesus Christ”. forms part of “Ascension” is carried by: “Ivory Panel”. was digitized by: “MiniDomeEvent2011934”.has created: “A.15-1955model v1.zip”.used as derivation source:…….
Legend Digitization_Process Formal_Derivation Sub-events Data_Object Man_Made_Object 3D-COFORM: Concatenated Metadata A.15-1955_nxtng_5_degrees_ complete_bjbrown.ply A.15-1955 NXTENG whole model v1.zip created_derivative has_created created_derivative A.15-1955 Degree scans.zip 7Ivory_NE_MeshLabProcEvent.rdf A.15-1955 NXTENG whole model v1.rdf forms_part_of 9Ivory_NE_MeshLabProcEvent.rdf 4IvoryPanel_NE_DetSeqEvent.rdf 6Ivory_NE_MeshLabProcEvent.rdf used_as_derivation_source A.15-1955 corner scans.rdf A.15-1955 Master.zip forms_part_of used_as_derivation_source forms_part_of A15-1955 Retouched.zip created_derivative has_created used_as_derivation_source 3IvoryPanel_NE_ObjAcqEvent.rdf A.15-1955 corner scans.zip created_derivative 8Ivory_NE_MeshLabProcEvent.rdf digitized used_as_derivation_source 3IvoPan_LegacyData.rdf used_as_derivation_source A.15-1955-dome-out.zip 2009CA5307v Coloured.ply 4Ivory_Arc3DProcEvent.rdf Arc3D-A.15-1955_dmy.v3d 5Ivory_MeshLabProcEvent.rdf created_derivative has_created digitized created_derivative A.15-1955-dome-out.rdf forms_part_of has_created 2009CR4851_0.rdf 1IvoryPanel_ObjAcqEvent.rdf … 2009CR4851_0.tif forms_part_of 2009CA5306_0.rdf … 2IvoryPanel_DocEvent.rdf forms_part_of forms_part_of has_created 2009CA5306_0.tif
Conclusions • CRMdig provides a good high-level model for empirical provenance of digital data, open for further specialization, integrated with arbitrary context representations • CRM-CRMDig outperforms competitors in expressive power and integration potential. • Future work: Theory of property propagation to subevents and derivatives • Needed: Theories of feature conservation by kinds of derivation processes. • Links: http://www.ics.forth.gr/isl/rdfs/3D-COFORM_CRMdig_v2.5.rdfs