1 / 59

Metadata Interoperability and CONTENTdm

Metadata Interoperability and CONTENTdm. Midwest CONTENTdm Users Group April 30, 2008 IUPUI Indianapolis, IN Amy Jackson , amyjacks@uiuc.edu Myung-ja Han , mhan3@uiuc.edu University of Illinois at Urbana Champaign. University of Illinois at Urbana Champaign.

Download Presentation

Metadata Interoperability and CONTENTdm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata Interoperability and CONTENTdm Midwest CONTENTdm Users Group April 30, 2008 IUPUI Indianapolis, IN Amy Jackson, amyjacks@uiuc.edu Myung-ja Han, mhan3@uiuc.edu University of Illinois at Urbana Champaign

  2. University of Illinois at Urbana Champaign • Producer/consumer of metadata in CONTENTdm • Currently use CONTENTdm to provide public access to 12 collections • Various projects harvest metadata from 11 CONTENTdm repositories around the nation

  3. Metadata Interoperability and CONTENTdm • Metadata and CONTENTdm • Service provider • Data provider • Longitudinal analysis of harvested metadata • Qualitative results • Quantitative results • Service provider view (Amy) • Data provider view (MJ)

  4. IMLS Digital Collections and Content • Project began December 2002 as an IMLS National Leadership Grant • Carole Palmer, Principal Investigator, 2007-2010 • Tim Cole, Principal Investigator, 2002-2007 • Amy Jackson, Project Coordinator • Collaboration between UIUC Library and Graduate School of Library and Information Science • http://imlsdcc.grainger.uiuc.edu/

  5. IMLS Digital Collections and Content • Project Objectives: • Implement a collection registry of digital collections created or developed with funding from IMLS NLG program • Use OAI-PMH to implement an item-level metadata repository for items contained in NLG collections • Carry out associated research related to: • Utility and usability of Registry & Repository • Current metadata practices of IMLS NLG grantees • Implications for interoperability (Framework of Guidance for Building Good Digital Collections)

  6. Item-level repository • Item-level Repository • Harvesting 71 of 195 Collections (36%) • 37 Repositories (some multiple institutions) • 10 CONTENTdm repositories • 310,448 records • Item Records (self identified types) • 86% images • 14% text

  7. Item-level repository Number of harvested collections using each DC field

  8. Item-level repository Top Item-level subjects Archaeology Buildings Photographers Mountains Men Archaeological site Insect Bodies of water

  9. OAI-PMH • All 37 repositories export metadata in simple Dublin Core • Five export in schemas other than simple or Qualified Dublin Core • MARC21 • MODS • OLAC • ETDMS

  10. Metadata harvesting • OAI-PMH • Harvested approach rather than federated approach • Data providers – create and expose metadata • Service providers – harvest and aggregate metadata • Based on HTTP and XML • Requires use of Dublin Core • Encourages and supports other formats

  11. How OAI Works (Technically) 6 distinct ‘verbs’ or request OAI requests are sent via HTTP Responses are sent in valid XML Service Provider Data Provider Digi. Mana. Sys. A G G R E G A T E D OAI H A R V E S T E R OAI Data P R O V I D E R M E T A D A T A HTTP Request (OAI Verb) HTTP Response (Valid XML)

  12. OAI-PMH in CONTENTdm • Enable oai.txt file • CONTENTdm base url followed by /cgi-bin/oai.exe • http://images.library.uiuc.edu:8081/cgi-bin/oai.exe • OAI “verbs” • ?verb=Identify • Return general information about the archive and its policies (e.g., datestamp granularity) • http://images.library.uiuc.edu:8081/cgi-bin/oai.exe?verb=Identify

  13. OAI-PMH verbs • Identify • ListMetadataFormats • ListSets • ListIdentifiers • ListRecords • GetRecord

  14. OAI-PMH • ListSets • Purpose • Provide a listing of sets in which records may be organized (may be hierarchical, overlapping, or flat) • http://images.library.uiuc.edu:8081/cgi-bin/oai.exe?verb=ListSets

  15. OAI-PMH • ListRecords • Purpose • Retrieves metadata records for multiple items • Parameters • from – start date • until – end date • set – set to harvest from • resumptionToken – flow control mechanism • metadataPrefix – metadata format • http://images.library.uiuc.edu:8081/cgi-bin/oai.exe?verb=ListRecords&metadataPrefix=oai_dc

  16. OAI-PMH • Barriers to sharing metadata through OAI-PMH • Technical Infrastructure • Metadata • Institution/Project • CONTENTdm • Compliant with OAI-PMH • Metadata is mapped to DC

  17. Harvested Metadata • How has use of Dublin Core changed over time? Records harvested from January 1, 2001 and December 31, 2006. • Quantitative analysis • What measurable changes can we see in the metadata? • Qualitative analysis • How has use of fields changed over time?

  18. Quantitative analysis • Quantitative analysis • Repetition of elements • Length of fields • Use of core fields (Shreeves et al. (2005))

  19. Quantitative analysis • Repetition of fields • Stable • Length of fields • Stable • Use of all 8 core fields • Declining

  20. Percent of records containing all core DC fields 100% 93.95% IMLS 90% IMLS & CIC 80% 71.65% 70.27% 70% 60% 52.41% 50% 40% 30% 22.97% 18.55% 20% 11.04% 7.99% 10% 0% 34 or more 21 to 33 11 to 20 10 or less Age of record in months Quantitative Analysis

  21. Quantitative Analysis • Of these eight elements, the two elements most often missing are creator (used in 39% of records) and rights (52%). • Identifier, title, and subject were each used in over 96% of all records. • Format and description fields have shown the most significant decline in use since 2003. • Decreased repetition and length of the description field, and an overall increase in use of the relation field.

  22. Percent of Records Containing each DC field

  23. Conclusions • Recommendations • Publish local metadata practices • Publish crosswalking information • Expose native metadata in addition to Dublin Core

  24. Amy Jackson Project Coordinator IMLS Digital Collections and Content University of Illinois at Urbana Champaign amyjacks@uiuc.edu

  25. Data provider view

  26. What does exporting mean? Qualitative analysis - Changes over time - Unpacking MARC - Incorrect mapping - Misuse and confusion of DC elements - What top expose and what not - Lost in harvesting What we have learned Recommendations

  27. What does exporting mean? Exporting Makes collection metadata available for service providers to harvest. CONTENTdm has a turnkey option to make this possible. Has DC mapping to provide Dublin Core records to service providers.

  28. Why export metadata? Increases exposure of collections Broadens user base We can no longer assume that users will come through the front door, sharing metadata gets us ‘in the flow (Locan Dempsey)’ - Metadata for you & me

  29. Qualitative analysis 225 records from 6 repositories (time increments) - Document changes in practice over time - Compare original record vs. harvested record in service provider’s environment 600 randomly selected records 95 records from 11 repositories and 19 collections harvested from CONTENTdm

  30. Any Changes over time? Only 1 observed change in overtime Early records: <title>Frankie / Music by Neil Sedaka; words by Howard Greenfield </title> Later records: <title>Frankie</title><creator>Music by Neil Sedaka; words by Howard Greenfield</creator>

  31. Other findings… Unpacking MARC Incorrect mapping Misuse and confusion of Dublin Core elements What to export and what not… And Lost in harvesting…

  32. Unpacking MARC Object Description Photograph: b&w; 6 1/8x8 in.<type>Photograph: b&amp;w; 6 1/8 x 8 in.</type> Publication Information [Lancaster, Pa.? : Johann Albrecht und Comp.?, 1790?] <publisher>[Lancaster, Pa.? : Johann Albrecht und Comp.?, 1790?]</publisher>

  33. Unpacking MARC a. MARC 245 could be mapped to: Subfield 'a' => <title> Subfield 'b' => <title> or <alternative> Subfield 'c' => <creator> or <contributor> Subfield 'f' => <date> Subfield 'g' => <date> Subfield 'h' => <format> Subfield 'k' => <type> Subfield 'n' => <description> or <title> Subfield 'p' => <description> or <title>

  34. Unpacking MARC b. MARC 260: <publisher> <date> c. MARC 6xx: <subject> <coverage-temporal> <coverage-spatial> <type>

  35. Incorrect Mapping a. Digital Reproduction Information Scanned as a 3000 pixel TIFF image in 8-bit grayscale, resized to 640 pixels in the longest dimension and compressed into JPEG format using Photoshop 6.0 and its JPEG quality measurement 3. Where do you map this? <format> Scanned as a 3000 pixel TIFF image in 8-bit grayscale, resized to 640 pixels in the longest dimension and compressed into JPEG format using Photoshop 6.0 and its JPEG quality measurement 3.

  36. Incorrect Mapping b. Repository University of Prominent Libraries. Special Collections Division. Repository Collection Prominent Photograph Collection. PH Coll 282 Where do you map these? <source> University of Prominent Libraries. Special Collections Division. <source> Prominent Photograph Collection. PH Coll 282

  37. Incorrect Mapping c. Physical description 9 in. x 6 in. Where do you map this? <description> 9 in. x 6 in.

  38. Misuse of Dublin Core elements a. <date> and <coverage> - Item about the nineteenth century, published in 2007. Metadata should be? <date>1800-1899 OR <date>2007 <coverage>1800-1899

  39. Misuse of Dublin Core elements b. <source> and <relation> Repository: PSMHS Collection is located at the Museum of History & Industry, Seattle Repository Collection: Joe Williamson Collection Both of them mapped to <source> <source>: A related resource from which the described resource is derived. <relation>: A related resource. - Dublin Core Metadata Element Set, Version 1.1

  40. Misuse of Dublin Core elements c.<type>, <format>, and <description> <type>Photograph: b&amp;w; 6 1/8 x 8 in.</type> <format>1 tool : wood</format> <description>9 in. x 6 in.</description> <description>Material: Whale Bone</description>

  41. After re-mapping the records… DC Elements Usages (118 records)

  42. After re-mapping the records… Number of records with 8 DC fields

  43. What to export and what not… a. Information about scanning? <format>Three-dimensional objects, oversized prints and posters photographed with a Nikon D1X digital camera at resolution of 1312 x 2000 pixels, eight bits per RGB channel in TIF format. Images downloaded onto CD-R's, then copied using a Dell Optiplex GX150 and stored in Network Area Storage for non-display archival purposes. Additional copy created for further processing. If necessary, color correction performed using Levels in Photoshop. Resized at 720 dpi vertical, then compressed using Photoshop setting of 80 into JPG format for Web display.</format>

  44. What to export and what not… b. Information about shelf, box, and folder number of item? <dc:source>99</dc:source> <dc:source>1</dc:source> <dc:source>14</dc:source> <dc:source>5</dc:source>

  45. What to export and what not… c. Two publishers, which to export? Digital Publisher Electronically reproduced by the Digital Services unit of the University of Central Florida Libraries, Orlando, 2005. Publisher Students of Rollins College. <publisher>Students of Rollins College.</publisher> The Digital Publisher information is not mapped to export.

More Related