Leveraging METS for Managing Complex Digital Collections at NYU

METS Case Study: The NYU Digital Library Team METS Opening Day 27 October, 2003 Leslie Myrick

Projects at NYU using METS • EAD Finding Aid Project • Tokyo Tribunal Proceedings • Afghanistan Digital Library • CRL Political Web Archiving Project • DRAM * • Hemispheric Institute * • REPO History Sign Project *

WHY METS? (1) METS was formulated to serve as a: Submission Information Package Archival Information Package Dissemination Information Package

Why METS? (2) In other words, it’s a … Transfer Syntax Archival Syntax Functional Syntax

METS and Complex Digital Objects • Finding aid + images with multiple scans/versions • Page turner for photo albums, documents, books – Edisto Album, Tokyo Tribunal brief, Afghanistan Digital Library • Multimedia/Time-Based Media Navigators: Hemispheric Institute; SMIL Viewer • Web Site Navigator – CRL Political Communications Web Archiving Project

Using METS as a SIP • Berol Collection Finding Aid -- in negotiations with RLG Cultural Materials Project • METS will be bundled with objects; EAD

METS as a Functional Syntax • METS designed not only for transfer and archival management, but for giving access to, navigating an object • METS + XSLT can create dynamic interfaces with links to resources and their metadata • METS can be dumped into Oracle, indexed and searched using context-aware queries.

METS Plays Well With Others We have … • EAD Finding Aids pointing to METS • METS pointing to Finding Aids and marcxml records • METS pointing to and manipulating TEI

METS and Extensions at NYU • MODS and DC for descriptive • MIX for Images/technical • textMD for text/technical • LC A/V Prototype + smptetechMD + AES • Missing Links: overall Preservation Schema plugin (PREMIS); rights MD schema

Ingredients (so far) • Perl • MySQL and some Oracle • Tomcat • Servlets and jsp • Saxon and XT • XSLT

Tools for Creation • zeroDB Database Input via interface as well as batch loading of metadata extracted by scripts e.g. ImageMagick identify, arcscraper.pl Outputs METS using Perl DBI

Tools for Dissemination • Page-turners • Multimedia Viewers • Thumbnail Browsers

Typical METS Creation Workflow • ImageMagick extraction of image metadata • Database input (batch and manual entry) of descriptive and technical metadata • Generation of METS using Perl DBI against MySQL

Image Magick Verbose Dump Image: taqw_001s.jpg Format: JPEG (Joint Photographic Experts Group JFIF format) Geometry: 625x886 Class: DirectClass Type: true color Depth: 8 bits-per-pixel component Colors: 33080 Profile-color: 552 bytes Profile-iptc: 5636 bytes unknown: êëÿ Resolution: 100x100 pixels/inch Filesize: 210kb Interlace: None Background Color: white Border Color: #dfdfdf Matte Color: grey74 Iterations: 0 Compression: JPEG signature: 8c37d0b82374d8eaa6b4d6b062699a9b8d7d86f2ba1d4e320f2226181d062822 Tainted: False

Image Magick non-Verbose Dump • taqw-fr001.tif TIFF 6500x6817 DirectClass 8-bit 126mb 4.3u 0:06 • taqw-fr001s.jpg[1] JPEG 625x886 DirectClass 8-bit 191kb 0.0u 0:01 • taqw-fr001t.jpg[2] JPEG 100x142 DirectClass 8-bit 9954b 0.0u 0:01

Extracting METS from a DB • doWebArchive.cgi MODS for homepage; DC for pages MIX for images/technical textMD for web page/technical

METS for Discovery • Dump METS files into Oracle as CLOB • Create Oracle Intermedia index • XML-aware full-text search • Example: CRL political web archiving project

CRL Political Web Archive • Collaboration between Stanford, Cornell, Texas, NYU, IA under aegis of CRL, Mellon • Sub-Saharan Africa, South East Asia, Latin America, Western Europe • Testbed: 400 URLs; websites from radical groups, NGOs • Internet Archive .arc files

Internet Archive .arc files • .arc file 100 MB aggregate of harvested files, along with HTTP headers and crawler-generated header for each file • Fine as a simple SIP, but basically unmanageable as an AIP or DIP • At present accessed using byte offsets to grab content from aggregate file • Only searchable by URL (Wayback Machine)

Automated extraction of text-based metadata e.g. web pages • arcscraper.pl • Descriptive and technical MD for object • datscraper.pl • Checksums, titles • Links from each object • makeLinkTable.pl • Creates link to object relationships

Go to Videotape

The Future? • Persistent Identifiers • Preservation Metadata Schema • Java development • Move from Oracle to Cheshire II

Leveraging METS for Managing Complex Digital Collections at NYU

Leveraging METS for Managing Complex Digital Collections at NYU

Presentation Transcript

The Library Top Ten

METS Navigator

For Service or Profit: A Case Study of a Public Library Café

PROFESSIONAL DEVELOPMENT IN DIGITAL ENVIRONMENT: A Case Study of Public Libraries in Manipur

METS at UCB

University Library Experience CDL Case Study

For Service or Profit: A Case Study of a Public Library Café

The British Library’s METS Experience

Digital Enclaves: A Case Study of The Root

METS and TEI

DIGITAL MEASUREMENT CASE STUDY/EXAMPLE OF OUTPUT

N ew York Mets

METS Awareness Training

CROATIAN DIGITAL LIBRARY INITIATIVES

Keeping the pieces together: The Role of METS in the Preservation of Digital Content

Metadata Encoding and Transmission Standard overview – and case study

Managing the Rhizome: METS for Web Archiving

Digital Archiving at TRC A Case Study

Digital Library management team:

Digital Library