190 likes | 203 Views
An Ontological Approach to Digital Preservation Metadata. Martin Doerr. Center for Cultural Informatics. Institute of Computer Science. Foundation for Research and Technology - Hellas. Prague, Czechia May 23, 2009. Digital Preservation Metadata.
E N D
An Ontological Approach to Digital Preservation Metadata Martin Doerr Center for Cultural Informatics Institute of Computer Science Foundation for Research and Technology - Hellas Prague, Czechia May 23, 2009
Digital Preservation Metadata • Cultural andscientific data cannot be understood without knowledge about the meaning of the data and the ways and circumstances of their creation • We use Metadata to assess • encoding (used formats, tools, instruments) • meaning (context of creation, experimental setups, background knowledge, etc. ), • relevance (described things, their status, their conditions), • quality (credibility, authenticity, calibration, tolerances, possible errors), • possibilities of Improvement and Reprocessing. • From generation to use, permanent storage, reuse (life-cycle) • No standardsyet!
Digital Preservation Metadata • Required: Reliable interoperable registration of the creation and modification processes and contextual conditions – “provenance metadata”, through time. • Solution: a common core ontology to explain the meaning of various data structures describing highly specialized processes. • Idea: • Metadata and scientific data and are historical records! • Tool-mediated creationand machine-supported processingis initiated, on behalf of and controlled by human activity. • Things, data, people, times and places are causally related by events. • Other relations are either deductions from events or found by observation events. • The CIDOC CRM (ISO21127) can be used as core model!
The CRM Digital Extended applications – Digital Provenance • Three applications so far: • For www.c-h-i.org: A completely CRM-based model for provenance (scientific workflow) metadata for generating RTI images. (combines up to 2000 individual shots). • For the European Integrated Project CASPAR on Digital Preservation: • Could explicate OAIS PDI Type “Provenance Information” and authenticity as a queries to the CRM. • European IP 3D-COFORM: Digital Provenance of 3D Models. • We have added 10 classes and some properties under the CRM: • Relation of human action and machine action. • Digitization as a measurement and information object creation • Formal derivation: feature preservation between input and output
The CRM DigitalDigital Events E5 Event E7 Activity E16 Measurement E65 Creation C7 Digital Machine Event E11 Modification C11 Digital Measurement Event C10 Software Execution C12 Data Transfer Event C2 Digitization Process C3 Formal Derivation
The CRM DigitalDigital Things E70 Thing E73 Information Object C1 Digital Object E22 Man-Made Object E54 Dimension E84 Information Carrier C8 Digital Device C13 Digital Information Carrier C9 Data Object
The CRM DigitalHuman creation by machine events P16 used specific object (was used for) E5 Event E7 Activity E70 Thing P94 has created (was created by) E28 Conceptual Object E65 Creation E22 Man-Made Object P9 consists of (forms part of) deduction E73 Information Object deduction S11 had output (was output of) S10 had input (was input of) C7 Digital Machine Event C1 Digital Object C1 Digital Object S12 happened on device (was device for) C8 Digital Device P8 took place on or within (witnessed) E19 Physical Object E4 Period
The CRM DigitalSoftware Execution S11 had output (was output of) S10 had input (was input of) C7 Digital Machine Event C1 Digital Object C1 Digital Object S12 happened on device (was device for) S2 used as source (was source for) C10 Software Execution C8 Digital Device C1 Digital Object S13 used parameters (parameters for) C3 Formal Derivation C1 Digital Object
The CRM DigitalDigital Measurement (Activity view) P125 used object of type (was type of object used in) E7 Activity E55 Type C7 Digital Machine Event S15 measured thing of type (was type of thing measured by) E16 Measurement E65 Creation E11 Modification P40 observed dimension (was observed in) C11 Digital Measurement Event E54 Dimension S20 has created (was created by) C9 Data Object
The CRM DigitalDigitization = feature transfer physical-digital E11 Modification E65 Creation P31 has modified (was modified by) P94 has created (was created by) E24 Physical Man-Made Thing E28 Conceptual Object C11 Digital Measurement Event S15 measured thing of type (was type of thing measured by) S18 has modified (was modified by) C13 Digital Information Carrier C2 Digitization Process E70 Thing P128 carries (is carried by) S20 has created (was created by) S19 stores (is stored on) S1 digitized (was digitized by) C1 Digital Object E73 Information Object E18 Physical Thing C1 Data Object
The CRM DigitalUnreliable transfer S11 had output (was output of) S10 had input (was input of) C7 Digital Machine Event C1 Digital Object C1 Digital Object E11 Modification S14 transferred (was transferred by) P31 has modified (was modified by) C1 Digital Object C12 Data Transfer Event E84 Information Carrier S12 happened on device (was device for) S15 has sender (was sender for) S16 has receiver (was receiver for) C8 Digital Device C8 Digital Device C8 Digital Device
E73 Information Object E39 Actor Vincent van Gogh Foundation E39 Actor E39 Actor E39 Actor Johanna van Gogh-Bonger Theo van Gogh Vincent Willem van Gogh E10 Transfer of Custody E10 Transfer of Custody The custody passing to Johanna's son The custody passing to Theo's widow E10 Transfer of Custody The custody passing to the van Gogh Foundation Preservation Metadatahistory of physical objects P50 has current keeper(is current keeper of) P29custody received by (received custody through) P28 custody surrendered by(surrendered custody through) P30 transferred custody of (custody transferred through) P29custody received by (received custody through) P28 custody surrendered by(surrendered custody through) P29custody received by (received custody through) P28 custody surrendered by(surrendered custody through)
Preservation Metadatacreation of born-digital objects P94 has created (was created by) E65 Creation E28 Conceptual Object “The conception of ADT” “ADT” P14 carried out by (performed) P14.1 in the role of P14.1 in the role of E55 Type E55 Type “Writer” “Composer” E39 Actor E39 Actor “the creator of ADT music” “the creator of ADT libretto” P131 is identified by (identifies) P131 is identified by (identifies) P131 is identified by (identifies) E82 Actor Appellation E82 Actor Appellation “Peter Szendy” “Georges Aperghis”
C1 Digital Object Crete.jpg E55 Type E55 Type JPG2PNG JPG C1 Digital Object C1 Digital Object Crete.png color depth=24 resolution = 600 compression level = 5 E55 Type E28 Conceptual Object PNG Adobe Photoshop CS2 C1 Digital Object CreteSmall.png E55 Type Software Preservation Metadatatransformation of digital objects E29 Design or Procedure P2 has type (is type of) JPG2PNG Algorithm X P33 used specific technique (was used by) P2 has type (is type of) S2 used as source (was source for) C3 Formal Derivation P32 used general technique (was technique of) JPG2PNG conversion P94 has created (was created by) S13 used parameters (parameters for) P2 has type (is type of) S2 used as source (was source for) P16 used specific object (was used for) C3 Formal Derivation Reduce png resolution P94 has created (was created by) P2 has type (is type of) P2 has type (is type of)
Preservation MetadataAuthenticity • Authenticity can be defined on Object History: Given: Man-Made Object O1, “was present at” Event E1 (typically creation or publication) Man-Made Object O2, “was present at” Event E2 (typically ingestion or validation) Information Object X1 “is carried by” O1 (historical carrier) Information Object X2 “is carried by” O2 (current carrier) O2 is “authentic” if O2 = O1, or X1 = X2 • Reasoning on completeness/security of curation and carrier transfer chain and/or comparison of multiple assumed current carriers.
The Open Provenance Model • An annotated causality graph defined as a record of a past (or current) execution • Three node types • Artifact - Immutable piece of state, which may have a physical embodiment in a physical object, or a digital representation in a computer system. • Process - Action or series of actions performed on or caused by artifacts, and resulting in new artifacts. • Agent - Contextual entity acting as a catalyst of a process, enabling, facilitating, controlling, affecting its execution. • Nodes can be annotated with properties • Processes operate in one or more Roles (R)
The Open Provenance Model • Nodes are connected by edges • used(R) • wasGeneratedBy(R) • wasControlledBy(R) • wasTriggeredBy • wasDerivedFrom used(R) A P wasTriggeredBy P P A wasGeneratedBy(R) P A A wasDerivedFrom wasControlledBy(R) Ag P
The Open Provenance Model • Does not distinguish between material and immaterial objects • Does not explicitly model the concept of an Event, a concept of prominent importance. • Without the notion of event and also of physical objects that are carriers (devices) of information, it is not possible for example, to describe adequately the conditions under which a photograph was taken • the way OPM treats Processes resembles events, however the corresponding ontological structure of OPM is not rich enough. • provenance information recorded according to CRMdig can be mapped to an OPM-based view, but not the other way around
The CIDOC CRMConclusions • The CIDOC model and a suitable extension allow for representing all provenance related preservation metadata. • Specific tools need more models of specific parameter sets, that do not influence the integration of and reasoning on the provenance chain. • There is no competitive generic model that consistently describes material and digital objects and their related history. • Relationship between human and machine action still needs refinement: Using OWL we can avoid the ambiguity of multiple IsA.