220 likes | 232 Views
Explore how NYU Digital Library Team utilizes METS to manage diverse projects from EAD Finding Aid to Tokyo Tribunal Proceedings, ensuring effective submission, archival, and dissemination of information. Leveraging METS for dynamic interfaces, navigation, and search capabilities, the team collaborates with various tools and databases for robust preservation and access strategies.
E N D
METS Case Study: The NYU Digital Library Team METS Opening Day 27 October, 2003 Leslie Myrick
Projects at NYU using METS • EAD Finding Aid Project • Tokyo Tribunal Proceedings • Afghanistan Digital Library • CRL Political Web Archiving Project • DRAM * • Hemispheric Institute * • REPO History Sign Project *
WHY METS? (1) METS was formulated to serve as a: Submission Information Package Archival Information Package Dissemination Information Package
Why METS? (2) In other words, it’s a … Transfer Syntax Archival Syntax Functional Syntax
METS and Complex Digital Objects • Finding aid + images with multiple scans/versions • Page turner for photo albums, documents, books – Edisto Album, Tokyo Tribunal brief, Afghanistan Digital Library • Multimedia/Time-Based Media Navigators: Hemispheric Institute; SMIL Viewer • Web Site Navigator – CRL Political Communications Web Archiving Project
Using METS as a SIP • Berol Collection Finding Aid -- in negotiations with RLG Cultural Materials Project • METS will be bundled with objects; EAD
METS as a Functional Syntax • METS designed not only for transfer and archival management, but for giving access to, navigating an object • METS + XSLT can create dynamic interfaces with links to resources and their metadata • METS can be dumped into Oracle, indexed and searched using context-aware queries.
METS Plays Well With Others We have … • EAD Finding Aids pointing to METS • METS pointing to Finding Aids and marcxml records • METS pointing to and manipulating TEI
METS and Extensions at NYU • MODS and DC for descriptive • MIX for Images/technical • textMD for text/technical • LC A/V Prototype + smptetechMD + AES • Missing Links: overall Preservation Schema plugin (PREMIS); rights MD schema
Ingredients (so far) • Perl • MySQL and some Oracle • Tomcat • Servlets and jsp • Saxon and XT • XSLT
Tools for Creation • zeroDB Database Input via interface as well as batch loading of metadata extracted by scripts e.g. ImageMagick identify, arcscraper.pl Outputs METS using Perl DBI
Tools for Dissemination • Page-turners • Multimedia Viewers • Thumbnail Browsers
Typical METS Creation Workflow • ImageMagick extraction of image metadata • Database input (batch and manual entry) of descriptive and technical metadata • Generation of METS using Perl DBI against MySQL
Image Magick Verbose Dump Image: taqw_001s.jpg Format: JPEG (Joint Photographic Experts Group JFIF format) Geometry: 625x886 Class: DirectClass Type: true color Depth: 8 bits-per-pixel component Colors: 33080 Profile-color: 552 bytes Profile-iptc: 5636 bytes unknown: êëÿ Resolution: 100x100 pixels/inch Filesize: 210kb Interlace: None Background Color: white Border Color: #dfdfdf Matte Color: grey74 Iterations: 0 Compression: JPEG signature: 8c37d0b82374d8eaa6b4d6b062699a9b8d7d86f2ba1d4e320f2226181d062822 Tainted: False
Image Magick non-Verbose Dump • taqw-fr001.tif TIFF 6500x6817 DirectClass 8-bit 126mb 4.3u 0:06 • taqw-fr001s.jpg[1] JPEG 625x886 DirectClass 8-bit 191kb 0.0u 0:01 • taqw-fr001t.jpg[2] JPEG 100x142 DirectClass 8-bit 9954b 0.0u 0:01
Extracting METS from a DB • doWebArchive.cgi MODS for homepage; DC for pages MIX for images/technical textMD for web page/technical
METS for Discovery • Dump METS files into Oracle as CLOB • Create Oracle Intermedia index • XML-aware full-text search • Example: CRL political web archiving project
CRL Political Web Archive • Collaboration between Stanford, Cornell, Texas, NYU, IA under aegis of CRL, Mellon • Sub-Saharan Africa, South East Asia, Latin America, Western Europe • Testbed: 400 URLs; websites from radical groups, NGOs • Internet Archive .arc files
Internet Archive .arc files • .arc file 100 MB aggregate of harvested files, along with HTTP headers and crawler-generated header for each file • Fine as a simple SIP, but basically unmanageable as an AIP or DIP • At present accessed using byte offsets to grab content from aggregate file • Only searchable by URL (Wayback Machine)
Automated extraction of text-based metadata e.g. web pages • arcscraper.pl • Descriptive and technical MD for object • datscraper.pl • Checksums, titles • Links from each object • makeLinkTable.pl • Creates link to object relationships
The Future? • Persistent Identifiers • Preservation Metadata Schema • Java development • Move from Oracle to Cheshire II