1 / 40

METS at UCB

METS at UCB. Themes in the Implementation of METS Rick Beaubien UC Berkeley Library. METS Themes. METS creation Gathering metadata: structural, descriptive, administrative Generating METS objects METS Repositories Providing search access to METS objects Presenting METS objects.

erica
Download Presentation

METS at UCB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. METS at UCB Themes in the Implementation of METS Rick Beaubien UC Berkeley Library

  2. METS Themes • METS creation • Gathering metadata: structural, descriptive, administrative • Generating METS objects • METS Repositories • Providing search access to METS objects • Presenting METS objects

  3. METS Themes (2) • METS sharing • Sharing METS objects between METS Repositories • Sharing METS objects across standards and communities.

  4. Brief History: 3 eras • Paleozoic: pre-MOA2 (pre-1997) • 1994: California Heritage; preliminary DB • 1996: Honeyman Collection: Desc. MD elements defined • Mesozoic: MOA2 (1997-2001) • Consistent, non-central DB: struct., desc. & tech. md • MOA2 presentation tool • Cenozoic: METS (2001-2004) • Central database: struct., desc. & tech. md • Expanding, adapting MOA2 tools

  5. METS Projects at UCB • Archival collections: Project-based, grant-funded initiatives • New projects • Data Rescue projects • Stored materials: Online access to tables of contents • Technical Reports: heir of Dienst

  6. METS Creation at UCB • Centralized approach • Centralized database with associated metadata input and METS generation modules • Dispersed approach • Perl scripts—mostly ad hoc—extract necessary metadata from multiple sources

  7. METS creationCentralized Approach • Components: • GenDB: • Relational database • Web-based input modules • Batch load input modules • GenX • Generates METS objects from GenDB.

  8. METS creationGenDB Functionality • Metadata gathering tool • Facilitates input of structural & descriptive md (manual & batch) • Processing control tool • Guides the digitization process by vendors (image, transcription/structured text content) • Imports content file names and associated technical metadata coming out of digitization process

  9. GenDB: Typical Flow Structural Metadata + Descriptive Metadata Imaging/ Transcription WorkOrders Vendor GenDB User Technical MD Spreadsheets SQL Server Database METS

  10. METS creation: CentralizedGenDB Database • SQL Server database (Relational database) • Structural, descriptive and admin md recorded in flat table structure: • GenDB element sets antedate METS, MODS, etc. • MOA2.DTD determined structural metadata. • Multiple standards influenced descriptive md element set. • Accommodated to MODS • MOA2.DTD determined image technical md element set • Accommodated to MIX

  11. METS creation: CentralizedGenDB Web UI • Web based interface for manual input • Java Servlet Driven • Java Server Backend • Key Features • Configurable (by Project Managers) • Shields users from the complexities of METS and standards-specific vocabularies

  12. METS creation: CentralizedGenDB Batch Interface • Components: • Batchload schema to which MD to be loaded must conform. • Batch processor (Java module) • Other components shared with Web Interface • When useful • Anytime input can be programmatically generated from existing sources

  13. METS creation: CentralizedGenX: Generating METS • Components: • Java Program with graphical UI • Key Features • Shows list of Objects available for export • On demand, queries database to gather md pertaining to selected object(s) and package as METS with MODS, MIX extensions

  14. METS creation: CentralizedReality Check • Main GenDB limitations • Better at physical than logical structuring • No support yet for video/audio content • Redundant Keying of DescMD • No Collection level input

  15. METS CreationDispersed Approach • When used • Any time requisite metadata and content files already available—just needs to be harvested and packaged • Legacy databases • Projects not requiring GenDB to control digitization process • Method: Add hoc PERL scripts gather md & package as METS. • Why used: • Expedient. Lots of PERL programming expertise.

  16. Stored Materials Project Flow JPEG TOC scans Perl Script 1 1 5 Perl Script 2 2 GLADIS Catalog MODS Records 5 6 METS Objects 3 MARCtoMODS 4 MARC Records

  17. METS CreationFuture Trends • Trend toward centralization will continue; replace dispersed approach • Batch interface can handle most “dispersed” situations • Makes future maintenance easy • Helps insure consistency in METS output

  18. METS CreationCommon Issues • Immaturity/Lack of Extension schemas • Problems for expressing MD • Problems for gathering MD • METS & related schema status • METS stable • Descriptive Metadata: MODS, DC Simple, MARCXML • Technical MD: still immature, if available at all

  19. METS Access • Main sub-themes • Discovery • Presentation of content & associated metadata

  20. METS Access: DiscoverySearch Support at UCB • No centralized search support for our METS/MOA2 repository • Current discovery mechanisms: • Online catalog links • Finding Aids, OAC supported searching • Project home pages and Finding Tools

  21. METS Access: DiscoveryProjected Support • Options considered: • Tamino/XML database • Abandoned • Too many limits on XML support • Still have to build search interface from scratch • Cheshire • Greenstone

  22. METS Access: DiscoveryCheshire Option • What is it: • Developed by Ray Larson at U.C. Berkeley • “next-generation online catalog and full-text information retrieval system using advanced IR techniques” • Advantages: • Free • Indexes “hub documents” (like METS) and content files where they reside • Very sophisticated searching/ranking algorithms including Boolean • OAI interface

  23. METS Access: DiscoveryCheshire Option (2) • Disadvantages: • Does not support Unicode yet • coming in version 3 • Limited collection management support: • Adding collections • Developing search interface • No object-level presentation support

  24. METS Access: DiscoveryGreenstone Option • What is it: • Developed by New Zealand Digital Library Project at University of Waikato • “suite of software for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet or on CD-ROM.” • Advantages • Free/Open Source • Next version will be METS-based • Strong collection management support

  25. METS Access: DiscoveryGreenstone Option (2) • Advantages (cont’d) • Unicode support now • Fairly sophisticated search support • Some presentation support • OAI support in progress • Disadvantages • Does not index objects where they reside • This limitation may apply to METS-based version as well

  26. METS Access: PresentationGenView • Java-based Software suite developed at UCB for MOA2/METS presentation • History • Originates in Making of America II (1997) • XSLT in infancy • Web Services non-existent • CORBA/RMI and servlet technology were “hot” • GenView originally supported MOA2 objects • GenView adapted to accommodate METS

  27. GenView: Basic Architecture Java Servlet Web Interface RMI XSLT Repository Manager (java) METS Java Object METS XML Documents METS Java Objects

  28. METS Access: PresentationGenView Evaluated • Advantages: • It exists… • Presentation very efficient • Meets basic presentation needs well • Disadvantages • Geared towards image/native browser content • Limited configuration options • Complex; difficult to maintain

  29. METS Access: PresentationGenView In Context • XSLT-based approaches to METS presentation: • NYU: Native METS • Library of Congress: Transformed METS • University of Chicago: • Prebuilding html pages as part of an xslt transformation to load METS objects into Greenstone.

  30. Sharing METS Objects • Sharing METS objects between METS repositories • Plea for Profiles • Sharing METS objects across standards • METS and Learning objects standards

  31. METS Sharing: METS to METSMETS as Transfer Syntax • METS, like MARC, can function as transfer syntax • Problem: METS offers much more leeway to implementation than MARC • Key areas of variations: • Structure of <fileSec> and <structMap> and relations between the two • Extension Schemas used & required elements • Attribute vocabularies • mets/@TYPE • fileGrp/@USE , file/@USE

  32. METS Sharing: METS to METSSharing in UC System • Not a theoretical goal but a reality • All UC campus libraries participate in OAC/CDL • Moving towards profiles: • Common starting point: MOA2 • Working groups under auspices of OAC • Desired Result: Submission Profiles

  33. METS Sharing: Across StandardsMETS and other standards • METS originates in library world • especially suited to library needs • Focus/ primary concerns of other communities somewhat different: • developing their own digital object standards • Does this matter and why?

  34. METS Sharing: Across StandardsMETS and IMS-CP • IMS Global Learning Consortium developing learning object standards: • IMS-CP: analogous to METS • Goal: enable production of learning objects that can be played in IMS standards-savvy tools • Importance of compatibility with METS: • Incorporating library resources (METS) into learning objects • Archiving learning objects in METS-based repositories

  35. METS Sharing: Across StandardsUCB Library Efforts • METS/IMS-CP Cross Walk project • Headed by Raymond Yee of Interactive University at UCB • Results of effort thus far: • Analysis of key similarities and differences between two schemas • Preliminary x-walk • Published in Library Hi-Tech

  36. METS Sharing: Across StandardsUCB Library Efforts (2) • Summary of analysis: • Two schemas share many high level similarities: • Hierarchical “structMap” • “fileSec” for inventorying resources; referenced from “structMap” • Accommodation for MD defined by other schemas • Key difference: IMS-CP does not distinguish between presentation and content • Future: • Standards Merge? • Some provision for sharing across communities

  37. Links • California Heritage Collection. http://sunsite.berkeley.edu/CalHeritage/ • MOA2 Project. http://sunsite.berkeley.edu/moa2/ • GenDB Web Interface Demo. http://sunsite2.berkeley.edu/GenDB (Account: demoman; Password: demoman)

  38. Links • MODS. http://www.loc.gov/mods • MIX. http://www.loc.gov/mix • Cheshire II. http://cheshire.lib.berkeley.edu/ • Greenstone http://www.greenstone.org/cgi-bin/library

  39. Links • GenView demo. http://metsviewer.lib.berkeley.edu/metstest/BreenMETS.xml • NYU METS Page-Turner. http://dlib.nyu.edu/metstools/ • U. Chicago Chopin Early Editions (Greenstone-based collection). http://chopin.lib.uchicago.edu/

  40. Links • IMS-CP. http://www.imsglobal.org/content/packaging/index.cfm • METS/Educational Technology Interoperability: http://iu.berkeley.edu/crosswalk/

More Related