330 likes | 477 Views
ALA/CLA Annual Meeting 22 June 2003 Toronto, CA. Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources. Timothy W. Cole ( t-cole3@uiuc.edu ) University of Illinois at Urbana-Champaign http://dli.grainger.uiuc.edu/Publications/TWCole/ALA2003OAI/. Order of Presentation.
E N D
ALA/CLA Annual Meeting22 June 2003Toronto, CA Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole (t-cole3@uiuc.edu)University of Illinois at Urbana-Champaign http://dli.grainger.uiuc.edu/Publications/TWCole/ALA2003OAI/
Order of Presentation • Perspectives on OAI-PMH • Illinois OAI metadata harvesting project • Goals & objectives • Findings regarding metadata • Findings regarding search & discovery • New OAI projects at Illinois • IMLS digital collections & content • CIC OAI metadata harvesting project ALA 2003 / OAI-PMH
OAI Protocol for Metadata Harvesting Harvesting approachto interoperabilityat metadata level Divides world intoMetadata Providers& Service Providers Builds on HTTP,XML, & Dublin Core http://www.openarchives.org/ ALA 2003 / OAI-PMH
OAI Antecedents • Call to other E-Print archives (July 1999) Paul Ginsparg, Rick Luce, & Herbert Von de Sompel: “…mobilize core group to work towards achieving a universal service for author self-archived scholarly literature.” • Santa Fe Mtgs. (Oct. 1999 & June 2000) • OAI – PMH version history: • First Alpha Release, Sept. 2000 • 1.0 (Beta) Release January 2001 • 1.1 (Beta 2) Release July 2001 • 2.0 (Production) Release June 2002 ALA 2003 / OAI-PMH
Original OAI Organization • OAI Executive: • Carl Lagoze & Herbert Van de Sompel • OAI Steering Committee: • Co-Chairs: Dan Greenstein, Cliff Lynch • OAI Technical Committee • Funded by NSF, DLF & CNI • Seeks to be user community driven ALA 2003 / OAI-PMH
OAI-PMH as a tool • All about moving metadata around • Designed to be a building block, useable by many different communities • Can facilitate (in some cases enable) services & functions • Assumes widely distributed content, butcentralized indexing(!) & services • Build once, use for many applications • Focus of OAI is interoperability ALA 2003 / OAI-PMH
Harvesting vs. Broadcast • Competing approaches to interoperability • Distributed/Broadcast searching: search and discovery over remote services and data • Harvesting is when data/metadata is transferred from the remote source to the destination where search & discovery services are located (e.g. Union catalogs) • OAI-PMH is a harvesting protocol ALA 2003 / OAI-PMH
As Compared to Z39.50 ALA 2003 / OAI-PMH
Metadata vs. Resources • Resource refers to information objects or digital representations of information objects • Metadata item is a collection of properties about a resource (e.g. title, author, etc.) • Metadata record is a metadata item expressed in a specific syntax according to an XSD • OAI focuses on metadata, with the implicit understanding that metadata contains useful links to the source information object(s) ALA 2003 / OAI-PMH
When to use OAI-PMH • Metadata is sufficient for services desired • Normalization, dedupping, metadata augmentation desired • Content is widely distributed across small, non-Z39.50 enabled repositories • OAI-PMH is more lightweight than Z39.50 • Portals can use BOTH Z39.50 & OAI-PMH ALA 2003 / OAI-PMH
What OAI-PMH Is Not • Not search & discovery on its own • Not a database management system • Not a single metadata schema • Not OAIS ALA 2003 / OAI-PMH
How OAI Works OAI “VERBS” Identify ListMetadataFormats ListSets ListIdentifiers ListRecords GetRecord Service Provider Metadata Provider H A R VESTER REPOSITORY OAI HTTP Request OAI (OAI Verb) HTTP Response (Valid XML) ALA 2003 / OAI-PMH
HTML <meta> XML DBMS OAI Application (CGI, ASP, PHP, etc.) Webserver - HTTP OAI Provider Architectures Descriptive Metadata OAI Administrative Metadata, e.g.,Ids, datestamps, sets, formats OAI Harvesters ALA 2003 / OAI-PMH
A few projects using OAI-PMH • Basic building block of the National Science Digital Library • Large-scale implementations in E-Prints, OLAC, NDLTD, … • Built into ENCompass, ContentDM, Michigan’s DLXS, D-Space, and other products • Open Archives Forum in Europe; will be part of federation activities in the UK and EU ALA 2003 / OAI-PMH
Univ. of Illinois OAI Metadata Harvesting Project • Funded by Andrew W. Mellon Foundation(July 2001 – May 2003) • Primary objectives: • Develop & make available OAI harvesting tools • Build search services for aggregated metadata in the domain of cultural heritage • Examine metadata aggregation issues, including use of EAD in OAI context • Investigate utility of aggregated metadata, including preliminary testing with end-users ALA 2003 / OAI-PMH
Type of resources • 39 data providers • academic libraries • Museums / cultural orgs • digital libraries • public library • 1.1 million original DC records • + 1.5 million derived from EAD ALA 2003 / OAI-PMH
Variations in DC element usage • Records containing subject & description element • Many different controlled and local vocabularies in use • Granularity: a record may describe a collection of coins — or one coin ALA 2003 / OAI-PMH
Description:Digital image of a single-sized cotton coverlet for a bed with embroidered butterfly design. Handmade by Anna F. Ginsberg Hayutin. Source:Materials: cotton and embroidery floss. Dimensions: 71 in. x 86 in. Markings: top right hand corner has 1 1/2 in. x 1/2 in. label cut outs at upper left and right hand side for head board; fabric is woven in a variation of a rib weave; color each of yellow and gray; hand-embroidered cotton butterflies and flowers from two shades of each color of embroidery floss - blue, pink, green and purple and single top 20 in. bordered with blue and black cotton embroidery thread; stitches used for embroidery: running stitch, chain stitch, French knot and back stitches; selvage edges left unfinished; lower edges turned under and finished with large gray running stitches made with embroidery floss. Format:Epson Expression 836 XL Scanner with Adobe Photoshop version 5.5; 300 dpi; 21-53K bytes. Available via the World Wide Web. Coverage:— Date Created: 2001-09-19 09:45:18; Updated: 20011107162451; Created: 2001-04-05; Created: 1912-1920? Type:Image Excerpt of a metadata record describing a cotton coverlet ALA 2003 / OAI-PMH
Excerpt of a metadata record describing "American woven coverlet“ Description:Materials: Textile--Multi, Pigment—Dye; Manufacturing Process: Weaving--Hand, Spinning, Dyeing, Hand-loomed blue wool and white linen coverlet, worked in overshot weave in plain geometric variant of a checkerboard pattern.Coverlet is constructed from finely spun, indigo-dyed wool and undyed linen, woven with considerable skill. Although the pattern is simpler, the overall craftsmanship is higher than 1934.01.0094A. - D. Schrishuhn, 11/19/99 This coverlet is an example of early "overshot" weaving construction, probably dating to the 1820's and is not attributable to any particular weaver. -- Georgette Meredith, 10/9/1973 Source:— Format:228 x 169 x 1.2 cm (1,629 g) Coverage:Euro-American; America, North; United States; Indiana? Illinois? Date:Early 19th c. CE Type:cultural; physical object; original ALA 2003 / OAI-PMH
Implications • Service providers • Automatically normalize metadata encoding where possible (e.g., dates) • Normalize for and co-locate by type / format where possible • Metadata providers • Create metadata for interoperability • Consider more expressive schema – e.g., Qualified DC, MARC ALA 2003 / OAI-PMH
Original interface • Portal had two search pages—simple (keyword) and advanced.
Pilot study with student teachers • 23 users in honors-level C&I class • Assignment: Use the site in preparing a lesson plan (high school social studies) __________ • Introduced to “aggregated metadata” concept • Focus group interviews conducted • Students’ papers examined • Transaction logs analyzed ALA 2003 / OAI-PMH
Results of initial user testing 1. Users expected all links pointed to digital objects • Some records pointed to finding aids • Some records pointed to collection’s web site • Some records described analog objects 2. Users unable to make use of search results • Simple searches produced 1000s of unranked results • Advanced search (with limits) rarely used 3. Distinction between portal and data providers unimportant to users ALA 2003 / OAI-PMH
What does “online access” mean? • To librarian & curator • To student teacher ALA 2003 / OAI-PMH
Response to test results • EAD-derived records segregated • Analog only collections excluded • Categories of resource types reduced to 3: • Images and Video • Text, Sheet Music, and Websites • Museums and Archival Collections ALA 2003 / OAI-PMH
Revised interface • Simple keyword & advanced searchput on one page • Clarify “online access” • Natural language in Boolean operators ALA 2003 / OAI-PMH
Revised search results • Link goes to finding aid or collection page? “Learn more.” • Link displays object? “View item.” • Subj/Desc expanded ALA 2003 / OAI-PMH
IMLS Digital Collections & Content • Build a registry of all National Leadership Grant collections with digital content. • Assist and guide NLG projects in making item-level metadata sharable using OAI. • Build a repository and search & discovery tools for integrated access to the content of NLG collections (unique metadata schema?). • Research best practices for sharing metadata about diverse digital content and for supporting the interests of diverse user communities. ALA 2003 / OAI-PMH
CIC OAI metadata harvesting • Univ. of Illinois at UC will host an OAI-PMH metadata harvesting service for 10 CIC libraries • Project Goals (3 year experimentation phase) • Improve access to selected resources at CIC libraries • Advertise these resources (internally & externally) • Prepare member institutions for future grant-mandated OAI-based resource sharing • Serve as a useful testbed for experimentation with OAI-PMH, development of metadata best practices, usability and user needs testing, etc. ALA 2003 / OAI-PMH
Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources http://dli.grainger.uiuc.edu/Publications/TWCole/ALA2003OAI/ Timothy W. Cole (t-cole3@uiuc.edu)University of Illinois at Urbana-Champaign