400 likes | 701 Views
METS at UCB. Themes in the Implementation of METS Rick Beaubien UC Berkeley Library. METS Themes. METS creation Gathering metadata: structural, descriptive, administrative Generating METS objects METS Repositories Providing search access to METS objects Presenting METS objects.
E N D
METS at UCB Themes in the Implementation of METS Rick Beaubien UC Berkeley Library
METS Themes • METS creation • Gathering metadata: structural, descriptive, administrative • Generating METS objects • METS Repositories • Providing search access to METS objects • Presenting METS objects
METS Themes (2) • METS sharing • Sharing METS objects between METS Repositories • Sharing METS objects across standards and communities.
Brief History: 3 eras • Paleozoic: pre-MOA2 (pre-1997) • 1994: California Heritage; preliminary DB • 1996: Honeyman Collection: Desc. MD elements defined • Mesozoic: MOA2 (1997-2001) • Consistent, non-central DB: struct., desc. & tech. md • MOA2 presentation tool • Cenozoic: METS (2001-2004) • Central database: struct., desc. & tech. md • Expanding, adapting MOA2 tools
METS Projects at UCB • Archival collections: Project-based, grant-funded initiatives • New projects • Data Rescue projects • Stored materials: Online access to tables of contents • Technical Reports: heir of Dienst
METS Creation at UCB • Centralized approach • Centralized database with associated metadata input and METS generation modules • Dispersed approach • Perl scripts—mostly ad hoc—extract necessary metadata from multiple sources
METS creationCentralized Approach • Components: • GenDB: • Relational database • Web-based input modules • Batch load input modules • GenX • Generates METS objects from GenDB.
METS creationGenDB Functionality • Metadata gathering tool • Facilitates input of structural & descriptive md (manual & batch) • Processing control tool • Guides the digitization process by vendors (image, transcription/structured text content) • Imports content file names and associated technical metadata coming out of digitization process
GenDB: Typical Flow Structural Metadata + Descriptive Metadata Imaging/ Transcription WorkOrders Vendor GenDB User Technical MD Spreadsheets SQL Server Database METS
METS creation: CentralizedGenDB Database • SQL Server database (Relational database) • Structural, descriptive and admin md recorded in flat table structure: • GenDB element sets antedate METS, MODS, etc. • MOA2.DTD determined structural metadata. • Multiple standards influenced descriptive md element set. • Accommodated to MODS • MOA2.DTD determined image technical md element set • Accommodated to MIX
METS creation: CentralizedGenDB Web UI • Web based interface for manual input • Java Servlet Driven • Java Server Backend • Key Features • Configurable (by Project Managers) • Shields users from the complexities of METS and standards-specific vocabularies
METS creation: CentralizedGenDB Batch Interface • Components: • Batchload schema to which MD to be loaded must conform. • Batch processor (Java module) • Other components shared with Web Interface • When useful • Anytime input can be programmatically generated from existing sources
METS creation: CentralizedGenX: Generating METS • Components: • Java Program with graphical UI • Key Features • Shows list of Objects available for export • On demand, queries database to gather md pertaining to selected object(s) and package as METS with MODS, MIX extensions
METS creation: CentralizedReality Check • Main GenDB limitations • Better at physical than logical structuring • No support yet for video/audio content • Redundant Keying of DescMD • No Collection level input
METS CreationDispersed Approach • When used • Any time requisite metadata and content files already available—just needs to be harvested and packaged • Legacy databases • Projects not requiring GenDB to control digitization process • Method: Add hoc PERL scripts gather md & package as METS. • Why used: • Expedient. Lots of PERL programming expertise.
Stored Materials Project Flow JPEG TOC scans Perl Script 1 1 5 Perl Script 2 2 GLADIS Catalog MODS Records 5 6 METS Objects 3 MARCtoMODS 4 MARC Records
METS CreationFuture Trends • Trend toward centralization will continue; replace dispersed approach • Batch interface can handle most “dispersed” situations • Makes future maintenance easy • Helps insure consistency in METS output
METS CreationCommon Issues • Immaturity/Lack of Extension schemas • Problems for expressing MD • Problems for gathering MD • METS & related schema status • METS stable • Descriptive Metadata: MODS, DC Simple, MARCXML • Technical MD: still immature, if available at all
METS Access • Main sub-themes • Discovery • Presentation of content & associated metadata
METS Access: DiscoverySearch Support at UCB • No centralized search support for our METS/MOA2 repository • Current discovery mechanisms: • Online catalog links • Finding Aids, OAC supported searching • Project home pages and Finding Tools
METS Access: DiscoveryProjected Support • Options considered: • Tamino/XML database • Abandoned • Too many limits on XML support • Still have to build search interface from scratch • Cheshire • Greenstone
METS Access: DiscoveryCheshire Option • What is it: • Developed by Ray Larson at U.C. Berkeley • “next-generation online catalog and full-text information retrieval system using advanced IR techniques” • Advantages: • Free • Indexes “hub documents” (like METS) and content files where they reside • Very sophisticated searching/ranking algorithms including Boolean • OAI interface
METS Access: DiscoveryCheshire Option (2) • Disadvantages: • Does not support Unicode yet • coming in version 3 • Limited collection management support: • Adding collections • Developing search interface • No object-level presentation support
METS Access: DiscoveryGreenstone Option • What is it: • Developed by New Zealand Digital Library Project at University of Waikato • “suite of software for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet or on CD-ROM.” • Advantages • Free/Open Source • Next version will be METS-based • Strong collection management support
METS Access: DiscoveryGreenstone Option (2) • Advantages (cont’d) • Unicode support now • Fairly sophisticated search support • Some presentation support • OAI support in progress • Disadvantages • Does not index objects where they reside • This limitation may apply to METS-based version as well
METS Access: PresentationGenView • Java-based Software suite developed at UCB for MOA2/METS presentation • History • Originates in Making of America II (1997) • XSLT in infancy • Web Services non-existent • CORBA/RMI and servlet technology were “hot” • GenView originally supported MOA2 objects • GenView adapted to accommodate METS
GenView: Basic Architecture Java Servlet Web Interface RMI XSLT Repository Manager (java) METS Java Object METS XML Documents METS Java Objects
METS Access: PresentationGenView Evaluated • Advantages: • It exists… • Presentation very efficient • Meets basic presentation needs well • Disadvantages • Geared towards image/native browser content • Limited configuration options • Complex; difficult to maintain
METS Access: PresentationGenView In Context • XSLT-based approaches to METS presentation: • NYU: Native METS • Library of Congress: Transformed METS • University of Chicago: • Prebuilding html pages as part of an xslt transformation to load METS objects into Greenstone.
Sharing METS Objects • Sharing METS objects between METS repositories • Plea for Profiles • Sharing METS objects across standards • METS and Learning objects standards
METS Sharing: METS to METSMETS as Transfer Syntax • METS, like MARC, can function as transfer syntax • Problem: METS offers much more leeway to implementation than MARC • Key areas of variations: • Structure of <fileSec> and <structMap> and relations between the two • Extension Schemas used & required elements • Attribute vocabularies • mets/@TYPE • fileGrp/@USE , file/@USE
METS Sharing: METS to METSSharing in UC System • Not a theoretical goal but a reality • All UC campus libraries participate in OAC/CDL • Moving towards profiles: • Common starting point: MOA2 • Working groups under auspices of OAC • Desired Result: Submission Profiles
METS Sharing: Across StandardsMETS and other standards • METS originates in library world • especially suited to library needs • Focus/ primary concerns of other communities somewhat different: • developing their own digital object standards • Does this matter and why?
METS Sharing: Across StandardsMETS and IMS-CP • IMS Global Learning Consortium developing learning object standards: • IMS-CP: analogous to METS • Goal: enable production of learning objects that can be played in IMS standards-savvy tools • Importance of compatibility with METS: • Incorporating library resources (METS) into learning objects • Archiving learning objects in METS-based repositories
METS Sharing: Across StandardsUCB Library Efforts • METS/IMS-CP Cross Walk project • Headed by Raymond Yee of Interactive University at UCB • Results of effort thus far: • Analysis of key similarities and differences between two schemas • Preliminary x-walk • Published in Library Hi-Tech
METS Sharing: Across StandardsUCB Library Efforts (2) • Summary of analysis: • Two schemas share many high level similarities: • Hierarchical “structMap” • “fileSec” for inventorying resources; referenced from “structMap” • Accommodation for MD defined by other schemas • Key difference: IMS-CP does not distinguish between presentation and content • Future: • Standards Merge? • Some provision for sharing across communities
Links • California Heritage Collection. http://sunsite.berkeley.edu/CalHeritage/ • MOA2 Project. http://sunsite.berkeley.edu/moa2/ • GenDB Web Interface Demo. http://sunsite2.berkeley.edu/GenDB (Account: demoman; Password: demoman)
Links • MODS. http://www.loc.gov/mods • MIX. http://www.loc.gov/mix • Cheshire II. http://cheshire.lib.berkeley.edu/ • Greenstone http://www.greenstone.org/cgi-bin/library
Links • GenView demo. http://metsviewer.lib.berkeley.edu/metstest/BreenMETS.xml • NYU METS Page-Turner. http://dlib.nyu.edu/metstools/ • U. Chicago Chopin Early Editions (Greenstone-based collection). http://chopin.lib.uchicago.edu/
Links • IMS-CP. http://www.imsglobal.org/content/packaging/index.cfm • METS/Educational Technology Interoperability: http://iu.berkeley.edu/crosswalk/