590 likes | 732 Views
Metadata Interoperability and CONTENTdm. Midwest CONTENTdm Users Group April 30, 2008 IUPUI Indianapolis, IN Amy Jackson , amyjacks@uiuc.edu Myung-ja Han , mhan3@uiuc.edu University of Illinois at Urbana Champaign. University of Illinois at Urbana Champaign.
E N D
Metadata Interoperability and CONTENTdm Midwest CONTENTdm Users Group April 30, 2008 IUPUI Indianapolis, IN Amy Jackson, amyjacks@uiuc.edu Myung-ja Han, mhan3@uiuc.edu University of Illinois at Urbana Champaign
University of Illinois at Urbana Champaign • Producer/consumer of metadata in CONTENTdm • Currently use CONTENTdm to provide public access to 12 collections • Various projects harvest metadata from 11 CONTENTdm repositories around the nation
Metadata Interoperability and CONTENTdm • Metadata and CONTENTdm • Service provider • Data provider • Longitudinal analysis of harvested metadata • Qualitative results • Quantitative results • Service provider view (Amy) • Data provider view (MJ)
IMLS Digital Collections and Content • Project began December 2002 as an IMLS National Leadership Grant • Carole Palmer, Principal Investigator, 2007-2010 • Tim Cole, Principal Investigator, 2002-2007 • Amy Jackson, Project Coordinator • Collaboration between UIUC Library and Graduate School of Library and Information Science • http://imlsdcc.grainger.uiuc.edu/
IMLS Digital Collections and Content • Project Objectives: • Implement a collection registry of digital collections created or developed with funding from IMLS NLG program • Use OAI-PMH to implement an item-level metadata repository for items contained in NLG collections • Carry out associated research related to: • Utility and usability of Registry & Repository • Current metadata practices of IMLS NLG grantees • Implications for interoperability (Framework of Guidance for Building Good Digital Collections)
Item-level repository • Item-level Repository • Harvesting 71 of 195 Collections (36%) • 37 Repositories (some multiple institutions) • 10 CONTENTdm repositories • 310,448 records • Item Records (self identified types) • 86% images • 14% text
Item-level repository Number of harvested collections using each DC field
Item-level repository Top Item-level subjects Archaeology Buildings Photographers Mountains Men Archaeological site Insect Bodies of water
OAI-PMH • All 37 repositories export metadata in simple Dublin Core • Five export in schemas other than simple or Qualified Dublin Core • MARC21 • MODS • OLAC • ETDMS
Metadata harvesting • OAI-PMH • Harvested approach rather than federated approach • Data providers – create and expose metadata • Service providers – harvest and aggregate metadata • Based on HTTP and XML • Requires use of Dublin Core • Encourages and supports other formats
How OAI Works (Technically) 6 distinct ‘verbs’ or request OAI requests are sent via HTTP Responses are sent in valid XML Service Provider Data Provider Digi. Mana. Sys. A G G R E G A T E D OAI H A R V E S T E R OAI Data P R O V I D E R M E T A D A T A HTTP Request (OAI Verb) HTTP Response (Valid XML)
OAI-PMH in CONTENTdm • Enable oai.txt file • CONTENTdm base url followed by /cgi-bin/oai.exe • http://images.library.uiuc.edu:8081/cgi-bin/oai.exe • OAI “verbs” • ?verb=Identify • Return general information about the archive and its policies (e.g., datestamp granularity) • http://images.library.uiuc.edu:8081/cgi-bin/oai.exe?verb=Identify
OAI-PMH verbs • Identify • ListMetadataFormats • ListSets • ListIdentifiers • ListRecords • GetRecord
OAI-PMH • ListSets • Purpose • Provide a listing of sets in which records may be organized (may be hierarchical, overlapping, or flat) • http://images.library.uiuc.edu:8081/cgi-bin/oai.exe?verb=ListSets
OAI-PMH • ListRecords • Purpose • Retrieves metadata records for multiple items • Parameters • from – start date • until – end date • set – set to harvest from • resumptionToken – flow control mechanism • metadataPrefix – metadata format • http://images.library.uiuc.edu:8081/cgi-bin/oai.exe?verb=ListRecords&metadataPrefix=oai_dc
OAI-PMH • Barriers to sharing metadata through OAI-PMH • Technical Infrastructure • Metadata • Institution/Project • CONTENTdm • Compliant with OAI-PMH • Metadata is mapped to DC
Harvested Metadata • How has use of Dublin Core changed over time? Records harvested from January 1, 2001 and December 31, 2006. • Quantitative analysis • What measurable changes can we see in the metadata? • Qualitative analysis • How has use of fields changed over time?
Quantitative analysis • Quantitative analysis • Repetition of elements • Length of fields • Use of core fields (Shreeves et al. (2005))
Quantitative analysis • Repetition of fields • Stable • Length of fields • Stable • Use of all 8 core fields • Declining
Percent of records containing all core DC fields 100% 93.95% IMLS 90% IMLS & CIC 80% 71.65% 70.27% 70% 60% 52.41% 50% 40% 30% 22.97% 18.55% 20% 11.04% 7.99% 10% 0% 34 or more 21 to 33 11 to 20 10 or less Age of record in months Quantitative Analysis
Quantitative Analysis • Of these eight elements, the two elements most often missing are creator (used in 39% of records) and rights (52%). • Identifier, title, and subject were each used in over 96% of all records. • Format and description fields have shown the most significant decline in use since 2003. • Decreased repetition and length of the description field, and an overall increase in use of the relation field.
Conclusions • Recommendations • Publish local metadata practices • Publish crosswalking information • Expose native metadata in addition to Dublin Core
Amy Jackson Project Coordinator IMLS Digital Collections and Content University of Illinois at Urbana Champaign amyjacks@uiuc.edu
What does exporting mean? Qualitative analysis - Changes over time - Unpacking MARC - Incorrect mapping - Misuse and confusion of DC elements - What top expose and what not - Lost in harvesting What we have learned Recommendations
What does exporting mean? Exporting Makes collection metadata available for service providers to harvest. CONTENTdm has a turnkey option to make this possible. Has DC mapping to provide Dublin Core records to service providers.
Why export metadata? Increases exposure of collections Broadens user base We can no longer assume that users will come through the front door, sharing metadata gets us ‘in the flow (Locan Dempsey)’ - Metadata for you & me
Qualitative analysis 225 records from 6 repositories (time increments) - Document changes in practice over time - Compare original record vs. harvested record in service provider’s environment 600 randomly selected records 95 records from 11 repositories and 19 collections harvested from CONTENTdm
Any Changes over time? Only 1 observed change in overtime Early records: <title>Frankie / Music by Neil Sedaka; words by Howard Greenfield </title> Later records: <title>Frankie</title><creator>Music by Neil Sedaka; words by Howard Greenfield</creator>
Other findings… Unpacking MARC Incorrect mapping Misuse and confusion of Dublin Core elements What to export and what not… And Lost in harvesting…
Unpacking MARC Object Description Photograph: b&w; 6 1/8x8 in.<type>Photograph: b&w; 6 1/8 x 8 in.</type> Publication Information [Lancaster, Pa.? : Johann Albrecht und Comp.?, 1790?] <publisher>[Lancaster, Pa.? : Johann Albrecht und Comp.?, 1790?]</publisher>
Unpacking MARC a. MARC 245 could be mapped to: Subfield 'a' => <title> Subfield 'b' => <title> or <alternative> Subfield 'c' => <creator> or <contributor> Subfield 'f' => <date> Subfield 'g' => <date> Subfield 'h' => <format> Subfield 'k' => <type> Subfield 'n' => <description> or <title> Subfield 'p' => <description> or <title>
Unpacking MARC b. MARC 260: <publisher> <date> c. MARC 6xx: <subject> <coverage-temporal> <coverage-spatial> <type>
Incorrect Mapping a. Digital Reproduction Information Scanned as a 3000 pixel TIFF image in 8-bit grayscale, resized to 640 pixels in the longest dimension and compressed into JPEG format using Photoshop 6.0 and its JPEG quality measurement 3. Where do you map this? <format> Scanned as a 3000 pixel TIFF image in 8-bit grayscale, resized to 640 pixels in the longest dimension and compressed into JPEG format using Photoshop 6.0 and its JPEG quality measurement 3.
Incorrect Mapping b. Repository University of Prominent Libraries. Special Collections Division. Repository Collection Prominent Photograph Collection. PH Coll 282 Where do you map these? <source> University of Prominent Libraries. Special Collections Division. <source> Prominent Photograph Collection. PH Coll 282
Incorrect Mapping c. Physical description 9 in. x 6 in. Where do you map this? <description> 9 in. x 6 in.
Misuse of Dublin Core elements a. <date> and <coverage> - Item about the nineteenth century, published in 2007. Metadata should be? <date>1800-1899 OR <date>2007 <coverage>1800-1899
Misuse of Dublin Core elements b. <source> and <relation> Repository: PSMHS Collection is located at the Museum of History & Industry, Seattle Repository Collection: Joe Williamson Collection Both of them mapped to <source> <source>: A related resource from which the described resource is derived. <relation>: A related resource. - Dublin Core Metadata Element Set, Version 1.1
Misuse of Dublin Core elements c.<type>, <format>, and <description> <type>Photograph: b&w; 6 1/8 x 8 in.</type> <format>1 tool : wood</format> <description>9 in. x 6 in.</description> <description>Material: Whale Bone</description>
After re-mapping the records… DC Elements Usages (118 records)
After re-mapping the records… Number of records with 8 DC fields
What to export and what not… a. Information about scanning? <format>Three-dimensional objects, oversized prints and posters photographed with a Nikon D1X digital camera at resolution of 1312 x 2000 pixels, eight bits per RGB channel in TIF format. Images downloaded onto CD-R's, then copied using a Dell Optiplex GX150 and stored in Network Area Storage for non-display archival purposes. Additional copy created for further processing. If necessary, color correction performed using Levels in Photoshop. Resized at 720 dpi vertical, then compressed using Photoshop setting of 80 into JPG format for Web display.</format>
What to export and what not… b. Information about shelf, box, and folder number of item? <dc:source>99</dc:source> <dc:source>1</dc:source> <dc:source>14</dc:source> <dc:source>5</dc:source>
What to export and what not… c. Two publishers, which to export? Digital Publisher Electronically reproduced by the Digital Services unit of the University of Central Florida Libraries, Orlando, 2005. Publisher Students of Rollins College. <publisher>Students of Rollins College.</publisher> The Digital Publisher information is not mapped to export.