200 likes | 222 Views
METS Dissemination: Interfaces. METS Opening Day 28 October, 2003 Leslie Myrick. NYU Collections using METS Interfaces. EAD Finding Aids Tokyo Tribunal Proceedings Afghanistan Digital Library * CRL Web Archiving Project DRAM Hemispheric Institute REPO History Sign Project. Ingredients.
E N D
METS Dissemination: Interfaces METS Opening Day 28 October, 2003 Leslie Myrick
NYU Collections using METS Interfaces • EAD Finding Aids • Tokyo Tribunal Proceedings • Afghanistan Digital Library * • CRL Web Archiving Project • DRAM • Hemispheric Institute • REPO History Sign Project
Ingredients • Tomcat Servlet Engine • XSLServlet or SaxonServlet • XT or Saxon Transformation Engine • MySQL Database for generation • Perl DBI and CGI for interface to DB
Why XSLT? • Relatively simple • Open-source, platform-neutral, standards-based • Official Recommendation of W3C • It is XML
Free XSLT Tools Abound Editors: emacs, NoteTab + Xalan .bat Servlet Containers – Tomcat, Resin Transformation Engines – Xalan, Saxon, XT Parsers – Xerces, Aelfred,XP/Sax, Crimson Parsing APIs: DOM, SAX
METS as a Functional Syntax • METS designed not only for transfer and archival management, but for giving access to, navigating an object • METS + XSLT can create dynamic interfaces with links to resources and their metadata • METS can be dumped into Oracle, indexed and searched using context-aware queries.
How to Navigate a METS Document • ID, IDREF, IDREFS • Each ID must have a matching IDREF and v.v. • To match an ID against more than one value use IDREFS (e.g. multiple ADMID values in METS:file • Keys • More flexible; they make document into a database
ID, IDREF, IDREFS • Provide navigable relationships between files and their metadata in complex Schema e.g. METS • Must be defined in Schema or DTD • Restrictive: Element can have only one ID; ID values must be unique (e.g. authorID and artistID can’t be same)
Keys; the key() function • Creates an index • Defined in the stylesheet and not in the DTD/Schema • Flexible – many keys on one element: one for each attribute. • Any number of elements can match a given value
Uses for METS • From the humble Finding Aid … to …
METS and Finding Aids • Beyond the <dao> href pointer • Useful for managing complex image structure – e.g. multiple scans of multiple pages of letters • Holistic way to present descriptive metadata along with inline image (all in one package) • Also useful for presenting technical metadata that EAD does not yet accommodate
METS Pageturners • Creates HTML page or frameset with links to resources • Creates navigable relationships between resources in a METS file • Creates complex time-based media synchronizationss
Sfquad.xml redux • Question: could XSLT mimic java in rendering METS? • The answer at the time: no • Dynamic frame reloading a special problem
N-YHS Edisto Album • Album of 77 images from the Civil War period • Logical structure: album – page - images • Two to four images per page • Presented with or without collapsible TOC
Tokyo Tribunal • Simple nested structure: jpg page views of Decision taken by the Tokyo Tribunal • Collapsible TOC to unpack logical structure of various parts
Afghanistan Digital Library • 40 books from 1871-1930 (400 eventually) • Simple structure – no chapters for the most part • METS Web viewer + PDF / CD version • Page Images (TIFF at 600 dpi); service files at 98-100 dpi
CRL Political Web Archive • Collaboration between Stanford, Cornell, Texas, NYU, IA under aegis of CRL, Mellon • Sub-Saharan Africa, South East Asia, Latin America, Western Europe • Testbed: 400 URLs; websites from radical groups, NGOs • Internet Archive .arc files
Internet Archive .arc files • .arc file 100 MB aggregate of harvested files, along with HTTP headers and crawler-generated header for each file • Fine as a simple SIP, but basically unmanageable as an AIP or DIP • At present accessed using byte offsets to grab content from aggregate file • Only searchable by URL (Wayback Machine)
Can METS save .arc? • One solution: a METS file for each website contained in .arc • At collection level, ur-METS file to manage the different versions of website on different dates in different .arcs • Alternatively, a METS file for each arc, delineating content by byte offset? Naah.
It’s the Structure, Silly • Ur-METs with <METS:mptr> to versions (cf. serials model) • Failure of web-archiving access models to date due to indexing at page level only • Netarkivet.dk – NWA Document format xml document for each page; indexed by FAST • Results: thousands of hits and no context.