1 / 42

DRS 2 Metadata Migration

DRS 2 Metadata Migration. June 25, 2013. Agenda. Introduction Preliminary results - content analysis Metadata options Next steps Questions. Introduction. Reason for metadata migration. Different data model

nalani
Download Presentation

DRS 2 Metadata Migration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DRS 2 Metadata Migration June 25, 2013

  2. Agenda • Introduction • Preliminary results - content analysis • Metadata options • Next steps • Questions

  3. Introduction

  4. Reason for metadata migration • Different data model • File -> Object (a coherent set of content that is considered a single intellectual unit for purposes of description, use and/or management: for example a particular book, web harvest, serial or photograph.) • Different metadata schemas • Many locally-defined -> community-standard • Different packaging of metadata • Use of METS in some cases -> consistent use of METS

  5. Path to metadata migration We are here

  6. Key feedback points Process options Technical options

  7. Timing Next 3 months

  8. What does it involve? • Aggregate DRS1 files into objects • Different object types = content models • Generate an object descriptor per object

  9. Document example PDF file

  10. Document example New object (content model = DOCUMENT) PDF file

  11. Document example New object (content model = DOCUMENT) PDF file Descriptor file

  12. Still image example Archival master image file

  13. Still image example Archival master image file Productionmaster image file

  14. Still image example Archival master image file Productionmaster image file Deliverableimage file

  15. Still image example New object (content model = STILL IMAGE) Archival master image file Productionmaster image file Deliverableimage file

  16. Still image example New object (content model = STILL IMAGE) Archival master image file Productionmaster image file Deliverableimage file Descriptor file

  17. Aggregate DRS1 files into objects • One content file per object • Color profile • Document • Google document container 1 • Google document container 2 • Google document container 3 • Opaque container • Text

  18. Aggregate DRS1 files into objects • Multiple content files per object • Audio • Web harvest • Biomedical image • PDS document • Target image • MOA2 • Still image

  19. Generate object descriptors • METS format • Embedded schemas (PREMIS, MODS, MIX, etc.) • Metadata sources • DRS1 database • DRS1 METS files where they exist • Examining the content files • Catalog records?

  20. Preliminary results:Content analysis

  21. Preliminary content analysis • Conceptually “built” objects for 13/14 content models (~36 million / 44 million files) • All but still image • Order helps! PDS Document MOA2 Still Image Biomedical Image

  22. Preliminary content analysis • 1,091,670 objects from 36,190,120 files • ~33 files per object • Relatively few surprises but content analysis is not complete

  23. Content cleanup • MOA2 files (8,024) • Index maps (2,686) • Entity files (1) • Merged PDS descriptors (22,203)

  24. Content cleanup • Orphaned target image (5), target description files (4) • Orphaned audio files (71)

  25. Metadata options

  26. DESCRIPTOR DRS1 DRS2 O OBJECT INFO FILE INFO e.g., billingCode ownerCode accessFlag tech metadata owner-suppliedName role purpose quality usageClass e.g., billingCode ownerCode owner-suppliedName FILE INFO e.g., accessFlag tech metadata owner-suppliedName role processing quality usageClass

  27. DESCRIPTOR DRS1 DRS2 O OBJECT INFO FILE INFO e.g., billingCode ownerCode accessFlag tech metadata owner-suppliedName role purpose quality usageClass e.g., billingCode ownerCode owner-suppliedName FILE INFO e.g., accessFlag tech metadata owner-suppliedName role processing quality usageClass

  28. DESCRIPTOR DRS1 DRS2 O OBJECT INFO FILE INFO Object Label Object-level MODS e.g., billingCode ownerCode accessFlag tech metadata owner-suppliedName role purpose quality usageClass billingCode ownerCode owner-suppliedName caption unit name view text FILE INFO accessFlag tech metadata owner-suppliedName role processing quality usageClass METS Object Label MODS PDS info, etc.

  29. Objects • Owner supplied name is required • Need to generate during migration • Four cases • A METS file exists • New object will be built from a single content file • New object will be built from multiple content files • No OSN (potential case) • Proposal for most cases: • add prefix or suffix to METS or content file owner supplied name

  30. Objects • Other required object elements • insertionDate • date of earliest file? • captionBehavior • for existing objects, set based on billing code • prospectively, set by depositor • viewText • available for all objects, not just PDS • default to off

  31. Objects • Descriptive metadata • Take MODS from existing METS as is or import new • From Aleph • From Finding Aid • If re-imported, update METS label or not? • Import from OLIVIA based on owner supplied name for the file?

  32. Objects from existing METS • Identifiers for Harvard metadata • Identify finding aid identifiers • Convert “Old HOLLIS” numbers • Aleph IDs: include check digit or not? • Convert to URIs or actionable URNs from plain IDs • Could DRS format such URIs for new DRS2 input?

  33. Objects from existing METS • PDS elements • PDF owner text becomes caption unit name • viewOcr function becomes viewText • goto function will be automatically determined by presence of structMap/div attributes • Caption behavior • for existing objects, set by billing code

  34. Files • Run automated processes to identify, validate and characterize file technical characteristics • Extract technical metadata

  35. Files • isFirstGenerationinDrs • Values: yes, no, unspecified • Should we supply “yes” for archival masters and/or top of derivation chain?

  36. Image Files • Converting from local scheme to MIX • Local field questions • Methodology • History • Source • Enhancements

  37. Text files • Converting from local scheme to textMD • Descriptor_typewill be absorbed into different places in DRS2 • Extracted metadata can supply • markup_basis • markup_language for specific schemas • possibly other elements

  38. Audio files • Moving from local schema to AES57-2011: Audio object structures for preservation and restoration

  39. Versioned metadata • History will be tracked for key administrative elements: • Access flag • Admin flag (new) • Billing code • Owner code • What values to assign for required creation date and agent for migrated content?

  40. Next steps

  41. Next steps • Continue analysis and development of technical requirements • Build prototype • September check-in on progress • Create metadata migration plan • Open meeting to review plan

  42. Open for questions

More Related