170 likes | 325 Views
Lifecycle Metadata for Digital Objects. October 18, 2004 Transfer / Authenticity Metadata. Review of metadata seen. Creation metadata Appraisal, records management, scheduling Transfer / authenticity not really covered except in terms of the ingest process. Transferring paper records I.
E N D
Lifecycle Metadata for Digital Objects October 18, 2004 Transfer / Authenticity Metadata
Review of metadata seen • Creation metadata • Appraisal, records management, scheduling • Transfer / authenticity not really covered except in terms of the ingest process
Transferring paper records I • Metaphor for electronic process • Metadata generated throughout • Records Center Storage Approval Form • Agency approval signature • Description of materials • Initial steps are significant for: • Setting up for secure transfer • Defining required metadata to make sense of records in storage • Approval Number received for transmission • This step embeds schedule metadata
Transferring paper records II • This stage defines formatting for: • Wrapper • Materials inside • Pack and label correctly (agreed standard) • Use proper boxes • Label with identifiers (RM descriptors) • Pack in original order and approved arrangement • Number boxes in batch • Stack correctly • Transmittal Form for batch • “Digest” of contents (this step a “handshake”) • Generates metadata for the transfer itself • Access Codes received for boxes
The central problem: Security guaranteeing Authenticity • Guarding the object (authenticity, integrity) • Proving the identities of the people responsible for transferring the object (authentication, non-repudiation) • Transferring the object in a secure way
Completeness and the moment of “recordness” • Assertion that the object is complete (cf. UBC) • Assertion that it is an archivable object • Assertion that the asserter has the authority to create the record or archive it • All these assertions may be system-supplied in the digital environment: • user logins • user role ID • identity of the workstation on the network • Creator’s action in performing a save
What is transfer about? • First: it is a COPY • What is a digital copy? What qualifies? • Data compression issues • Data segmentation issues • Creating application vs file-management application • How can a digital copy be guaranteed accurate? Compare with original • Digital object as string of bits • Message digest of object as math on the bits • Ship the message digest with the object • Recalculate and compare at the other end
Moving from user to repository • Using the public network securely • Sending from user to repository • Virtual Private Network (VPN) • Secure Sockets Layer (SSL) • “Secure drop-box” technology • Separate “hardened” server (between “DMZ”s) • Only A can deposit, only B can withdraw • Repository harvests objects from user’s drop-box
Proving the identity of the sender (Authentication I: Identity) • Assymetrical encryption • Private/public keys: reverse purposes • Private = used by one juridical person • Public = used by many persons • Digital signature • Calculate message digest • Use one of asymmetric key pair to transform • If recipient’s public key, only recipient can decode (using own private key) • If sender’s private key, only sender can have sent (proved by sender’s public key) • Use second of assymetric key pair to decrypt • Check message digest against message
Proving the identity of the sender (Authentication II: Non-repudiation) • Certification (PKI, “XKI”) • Connecting keys with juridical persons: third party certificators • External or internal (PKI can be managed for internal business, e.g. a state) • Endurance over time: What does CA say? • System permissions and activity • Data collected from system/network operations logs • Necessity for collecting as archival!
Authenticity of the object (Authentication III: Integrity) • Object as open or secret: two issues • Must we disguise/encrypt the object? • Can we move it around in clear? • (Cryptographic) Message Digest (MD5) • Creates single 32-digit number: “one-way hash” • Number will change with the slightest change in the object on which it was calculated • Insecure for encryption • Encryption (Confidentiality) • Asymmetric (now dominant) • Symmetric (issues of exchanging keys)
Proving the identity of the receiver • How is this done in the paper/physical case? • Locations • Signatures • Other signs and proofs • How done in the digital case? • Digital signature • System permissions • Recorded as part of repository operations records
Documenting the actual transfer • Time-stamps on the copy • System logs of the underlying transmitting and receiving systems • Desktop Windows systems have system logs but they are still fairly primitive • Server logs can be exremely elaborate • Repository/digital library logs can be designed to any requirement
Verifying the transfer • Quality control: compare with paper process • Verifying the message digest • Checking the object against the wrapper • Use metadata to make sure you have all of what was sent and in the proper format • This is the most fundamental process carried out during ingest
XML and digital signatures • XML wrapper for a set of objects permits individual or multiple objects to be signed: “subtree signing” • Objects can potentially be signed by different people in workflow • Thus a born-digital XML-wrapped object may already contain several digital signatures from different sources • May require verification and resigning as a single object by record-asserting entity before transfer
XML Signature <Signature> <SignedInfo> <CanonicalizationMethod Algorithm=“URI”/> <SignatureMethod Algorithm=“URI”/> <Reference URI=“URI”/> <Transforms><Transform Algorithm=“URI”></Transforms> <DigestMethod Algorithm=“URI”/> <DigestValue>32-bit value here</DigestValue> </Reference> </SignedInfo> <SignatureValue>32-bit value here</SignatureValue> <KeyInfo>info about key here</KeyInfo> </Signature>
What is canonicalization? • Two XML documents may differ in their entity structure, attribute ordering, and character encoding, because the standard doesn’t care • But a valid XML document has a precise logical structure related to its DTD or schema, no matter how it looks or what order things are in • Canonicalization means processing the XML file to a single standard form (as defined by W3C): see http://www.w3.org/TR/2001/REC-xml-c14n-20010315#Intro • What does this mean for “authenticity”?