400 likes | 417 Views
This article explores METS implementation and creation themes at UC Berkeley, focusing on the centralized and dispersed approaches, historical eras, METS projects, GenDB functionality, GenX, and future trends in METS creation. It discusses the central database with associated metadata input and METS generation modules vs. Perl scripts for metadata extraction. Topics include the efficient workflow of structural and descriptive metadata, batch interface, GenX for generating METS objects, and common issues faced in METS creation. Additionally, it covers METS access, discovery, presentation of content, and associated metadata at UCB, emphasizing the need for centralized search support and discussing current discovery mechanisms. The article also touches upon future trends in METS creation and common issues such as schema maturity and METS access challenges.
E N D
METS at UCB Themes in the Implementation of METS Rick Beaubien UC Berkeley Library
METS Themes • METS creation • Gathering metadata: structural, descriptive, administrative • Generating METS objects • METS Repositories • Providing search access to METS objects • Presenting METS objects
METS Themes (2) • METS sharing • Sharing METS objects between METS Repositories • Sharing METS objects across standards and communities.
Brief History: 3 eras • Paleozoic: pre-MOA2 (pre-1997) • 1994: California Heritage; preliminary DB • 1996: Honeyman Collection: Desc. MD elements defined • Mesozoic: MOA2 (1997-2001) • Consistent, non-central DB: struct., desc. & tech. md • MOA2 presentation tool • Cenozoic: METS (2001-2004) • Central database: struct., desc. & tech. md • Expanding, adapting MOA2 tools
METS Projects at UCB • Archival collections: Project-based, grant-funded initiatives • New projects • Data Rescue projects • Stored materials: Online access to tables of contents • Technical Reports: heir of Dienst
METS Creation at UCB • Centralized approach • Centralized database with associated metadata input and METS generation modules • Dispersed approach • Perl scripts—mostly ad hoc—extract necessary metadata from multiple sources
METS creationCentralized Approach • Components: • GenDB: • Relational database • Web-based input modules • Batch load input modules • GenX • Generates METS objects from GenDB.
METS creationGenDB Functionality • Metadata gathering tool • Facilitates input of structural & descriptive md (manual & batch) • Processing control tool • Guides the digitization process by vendors (image, transcription/structured text content) • Imports content file names and associated technical metadata coming out of digitization process
GenDB: Typical Flow Structural Metadata + Descriptive Metadata Imaging/ Transcription WorkOrders Vendor GenDB User Technical MD Spreadsheets SQL Server Database METS
METS creation: CentralizedGenDB Database • SQL Server database (Relational database) • Structural, descriptive and admin md recorded in flat table structure: • GenDB element sets antedate METS, MODS, etc. • MOA2.DTD determined structural metadata. • Multiple standards influenced descriptive md element set. • Accommodated to MODS • MOA2.DTD determined image technical md element set • Accommodated to MIX
METS creation: CentralizedGenDB Web UI • Web based interface for manual input • Java Servlet Driven • Java Server Backend • Key Features • Configurable (by Project Managers) • Shields users from the complexities of METS and standards-specific vocabularies
METS creation: CentralizedGenDB Batch Interface • Components: • Batchload schema to which MD to be loaded must conform. • Batch processor (Java module) • Other components shared with Web Interface • When useful • Anytime input can be programmatically generated from existing sources
METS creation: CentralizedGenX: Generating METS • Components: • Java Program with graphical UI • Key Features • Shows list of Objects available for export • On demand, queries database to gather md pertaining to selected object(s) and package as METS with MODS, MIX extensions
METS creation: CentralizedReality Check • Main GenDB limitations • Better at physical than logical structuring • No support yet for video/audio content • Redundant Keying of DescMD • No Collection level input
METS CreationDispersed Approach • When used • Any time requisite metadata and content files already available—just needs to be harvested and packaged • Legacy databases • Projects not requiring GenDB to control digitization process • Method: Add hoc PERL scripts gather md & package as METS. • Why used: • Expedient. Lots of PERL programming expertise.
Stored Materials Project Flow JPEG TOC scans Perl Script 1 1 5 Perl Script 2 2 GLADIS Catalog MODS Records 5 6 METS Objects 3 MARCtoMODS 4 MARC Records
METS CreationFuture Trends • Trend toward centralization will continue; replace dispersed approach • Batch interface can handle most “dispersed” situations • Makes future maintenance easy • Helps insure consistency in METS output
METS CreationCommon Issues • Immaturity/Lack of Extension schemas • Problems for expressing MD • Problems for gathering MD • METS & related schema status • METS stable • Descriptive Metadata: MODS, DC Simple, MARCXML • Technical MD: still immature, if available at all
METS Access • Main sub-themes • Discovery • Presentation of content & associated metadata
METS Access: DiscoverySearch Support at UCB • No centralized search support for our METS/MOA2 repository • Current discovery mechanisms: • Online catalog links • Finding Aids, OAC supported searching • Project home pages and Finding Tools
METS Access: DiscoveryProjected Support • Options considered: • Tamino/XML database • Abandoned • Too many limits on XML support • Still have to build search interface from scratch • Cheshire • Greenstone
METS Access: DiscoveryCheshire Option • What is it: • Developed by Ray Larson at U.C. Berkeley • “next-generation online catalog and full-text information retrieval system using advanced IR techniques” • Advantages: • Free • Indexes “hub documents” (like METS) and content files where they reside • Very sophisticated searching/ranking algorithms including Boolean • OAI interface
METS Access: DiscoveryCheshire Option (2) • Disadvantages: • Does not support Unicode yet • coming in version 3 • Limited collection management support: • Adding collections • Developing search interface • No object-level presentation support
METS Access: DiscoveryGreenstone Option • What is it: • Developed by New Zealand Digital Library Project at University of Waikato • “suite of software for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet or on CD-ROM.” • Advantages • Free/Open Source • Next version will be METS-based • Strong collection management support
METS Access: DiscoveryGreenstone Option (2) • Advantages (cont’d) • Unicode support now • Fairly sophisticated search support • Some presentation support • OAI support in progress • Disadvantages • Does not index objects where they reside • This limitation may apply to METS-based version as well
METS Access: PresentationGenView • Java-based Software suite developed at UCB for MOA2/METS presentation • History • Originates in Making of America II (1997) • XSLT in infancy • Web Services non-existent • CORBA/RMI and servlet technology were “hot” • GenView originally supported MOA2 objects • GenView adapted to accommodate METS
GenView: Basic Architecture Java Servlet Web Interface RMI XSLT Repository Manager (java) METS Java Object METS XML Documents METS Java Objects
METS Access: PresentationGenView Evaluated • Advantages: • It exists… • Presentation very efficient • Meets basic presentation needs well • Disadvantages • Geared towards image/native browser content • Limited configuration options • Complex; difficult to maintain
METS Access: PresentationGenView In Context • XSLT-based approaches to METS presentation: • NYU: Native METS • Library of Congress: Transformed METS • University of Chicago: • Prebuilding html pages as part of an xslt transformation to load METS objects into Greenstone.
Sharing METS Objects • Sharing METS objects between METS repositories • Plea for Profiles • Sharing METS objects across standards • METS and Learning objects standards
METS Sharing: METS to METSMETS as Transfer Syntax • METS, like MARC, can function as transfer syntax • Problem: METS offers much more leeway to implementation than MARC • Key areas of variations: • Structure of <fileSec> and <structMap> and relations between the two • Extension Schemas used & required elements • Attribute vocabularies • mets/@TYPE • fileGrp/@USE , file/@USE
METS Sharing: METS to METSSharing in UC System • Not a theoretical goal but a reality • All UC campus libraries participate in OAC/CDL • Moving towards profiles: • Common starting point: MOA2 • Working groups under auspices of OAC • Desired Result: Submission Profiles
METS Sharing: Across StandardsMETS and other standards • METS originates in library world • especially suited to library needs • Focus/ primary concerns of other communities somewhat different: • developing their own digital object standards • Does this matter and why?
METS Sharing: Across StandardsMETS and IMS-CP • IMS Global Learning Consortium developing learning object standards: • IMS-CP: analogous to METS • Goal: enable production of learning objects that can be played in IMS standards-savvy tools • Importance of compatibility with METS: • Incorporating library resources (METS) into learning objects • Archiving learning objects in METS-based repositories
METS Sharing: Across StandardsUCB Library Efforts • METS/IMS-CP Cross Walk project • Headed by Raymond Yee of Interactive University at UCB • Results of effort thus far: • Analysis of key similarities and differences between two schemas • Preliminary x-walk • Published in Library Hi-Tech
METS Sharing: Across StandardsUCB Library Efforts (2) • Summary of analysis: • Two schemas share many high level similarities: • Hierarchical “structMap” • “fileSec” for inventorying resources; referenced from “structMap” • Accommodation for MD defined by other schemas • Key difference: IMS-CP does not distinguish between presentation and content • Future: • Standards Merge? • Some provision for sharing across communities
Links • California Heritage Collection. http://sunsite.berkeley.edu/CalHeritage/ • MOA2 Project. http://sunsite.berkeley.edu/moa2/ • GenDB Web Interface Demo. http://sunsite2.berkeley.edu/GenDB (Account: demoman; Password: demoman)
Links • MODS. http://www.loc.gov/mods • MIX. http://www.loc.gov/mix • Cheshire II. http://cheshire.lib.berkeley.edu/ • Greenstone http://www.greenstone.org/cgi-bin/library
Links • GenView demo. http://metsviewer.lib.berkeley.edu/metstest/BreenMETS.xml • NYU METS Page-Turner. http://dlib.nyu.edu/metstools/ • U. Chicago Chopin Early Editions (Greenstone-based collection). http://chopin.lib.uchicago.edu/
Links • IMS-CP. http://www.imsglobal.org/content/packaging/index.cfm • METS/Educational Technology Interoperability: http://iu.berkeley.edu/crosswalk/