1 / 17

More Better Metadata

Even v. More Better Metadata. SAA 2014 Panel: Metadata and Digital Preservation: How Much Do We Really Need? Andrea Goethals, Harvard Library. How much metadata do we really need? That depends on the quality of the metadata. Context of my remarks.

paige
Download Presentation

More Better Metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Even v More Better Metadata SAA 2014 Panel: Metadata and Digital Preservation: How Much Do We Really Need? Andrea Goethals, Harvard Library

  2. How much metadata do we really need? That depends on the quality of the metadata...

  3. Context of my remarks • Experience developing for and now managing Harvard Library’s Digital Repository Service (DRS) (In production from 2000 – Present) • ~ 47 million files • Recent multi-year overhaul of repository to the new DRS • Provided chance to analyze metadata & rethink approach

  4. Prior to the new DRS • Most all metadata was user-contributed • Expertise ranged from professional labs to curators, archivists and other staff • Very little validation of user-contributed metadata • Metadata elements had grown organically rather than systematically. For example...

  5. Some elements weren’t specific enough • File format one of: ICC, GIF, JPEG, TIFF, TDF, TEXT, PCD, AIFF, RealAudio, APP, WAV, WFR, JP2, JPF, ZIP, GZIP, PDF • Format variations and versions not recorded

  6. Some elements were too specific • Text abstract character repertoire one of: ‘US-ASCII’, ‘Unicode’ • Text character map one of: ‘ISO_646.irv:1983’, ‘UTF-8’ • These weren’t validated so in reality the text could be in any character set but would be recorded as one of these regardless

  7. Some generic elements only tracked for certain formats • For images only: • enhancements • history • methodology • producer • production software • system • And the above elements allowed free-text, leading to a variety of interpretation over time

  8. Errors in relationship metadata • Missing relationships (e.g. referenced in the METS descriptor file but lacking explicit relationships) • Redundant relationships (files related more than once to the same files) • Illogical relationships (only discoverable because of redundant metadata) – Examples: • Target images related to other target images • Non-target images described as target images • A METS descriptor file described as a scanned image • Objects merged into themselves

  9. Strategies in the new DRS for improving metadata Pull descriptive metadata from catalogs at ingest or on request Automated format ingest, validation & metadata extraction at ingest Validation when files or ingested, added or removed or relationship metadata is changed FITS Sync with catalogs, check and improve metadata on migration

  10. File Information Tool Set (FITS) • Identifies many file formats • Validates a few file formats • Extracts metadata from files • Aggregates metadata from many tools • Calculates basic file info (file size, MD5, etc.) • Outputs technical metadata • Community-standard metadata schemas • Identifies problem files • Conflicting tool opinions on format, metadata values • Unidentifiable file formats • Encrypted, rights metadata embedded in files

  11. File Information Tool Set (FITS) consolidator exporter FITS wrapper + XSL FITS wrapper + XSL FITS wrapper + XSL FITS wrapper + XSL FITS wrapper + XSL FITS wrapper + XSL FITS XML FITS XML FITS XML FITS XML FITS XML FITS XML JHOVE DROID Any file NLNZ ME FITS XML StandardXML ExifTool File utility FFIdent +Tika, OIS Audio Information, ADL Tool, OIS File Information, OIS XML Metadata

  12. FITS configured to get high quality metadata • Metadata normalization • ‘JPEG2000’ = ‘JPEG 2000’ = ‘JPEG 2000 image’ • ‘inches’ = ‘2’ = ‘in.’ • Plays to strengths of tools and downplays their weaknesses • Overall trust tool x over tool y • Don’t run tool x for format z • Format tree (hierarchy of related formats) • ‘OpenDocument’ is more specific than ‘Zip’

  13. Example of what we know about a file pre- and post-FITS adoption at ingest

  14. Additional strategies in the new DRS • Move away from overly restrictive metadata elements where needed – Examples: • Allow free text for format names • Any text character set • Add elements at the format-agnostic file level when they can apply to files in any format, e.g. producer or methodology • Flag suspicious metadata (and content) for later analysis

  15. Administrative flags • Help pinpoint incorrect metadata, problem content or where metadata tools need improvement • Some examples: • FAILED_METADATA_EXTRACTION • FORMAT_ID_CONFLICT • INCORRECT_METADATA • INHIBITOR • RIGHTS_METADATA

  16. They said it better “It is quality rather than quantity that matters.” – LuciousAnnaeous Senegal “Quality is not an act, it is a habit.” – Aristotle “Quality is never an accident. It is always the result of intelligent effort.” – John Ruskin

  17. Thank you!

More Related