370 likes | 558 Views
The Basics of OAI. An Introduction to the Protocol for Metadata Harvesting. Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July 27, 2004. Outline. What the OAI protocol is & what it is not Place in digital library infrastructure How it works (basically)
E N D
The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July 27, 2004
Outline • What the OAI protocol is & what it is not • Place in digital library infrastructure • How it works (basically) • Challenges for data / service providers Basics and Beyond
OAI- PMH is a tool • Moves metadata (not content) from a data provider to a service provider (or harvester) • A set of rules that defines the communication between two systems (like FTP and HTTP) • Build once, use for many applications – a building block for digital library services Facilitates the federation of metadata Basics and Beyond
OAI-PMH is not…. Metadata A search tool A database Open Access Basics and Beyond
Who uses OAI? • Approximately 400 data providers • Basic building block of the National Science Digital Library (NSDL); OAIster • Incorporated into D-Space and Eprints.org • Part of CONTENTdm, Michigan’s DLXS, and other products • International use Basics and Beyond
Basic OAI-PMH Concepts • “Aggregated search” rather than “Federated search” • Data providers – support OAI PMH as a means to expose metadata • Service providers – ‘harvests’ metadata from data providers via the OAI-PMH • OAI-PMH based upon HTTP and XML • OAI-PMH requires use of simple Dublin Core • BUT supports and encourages use of other metadata schemas • Unique and Persistent Identifiers and a Datestamp for each OAI record Basics and Beyond
Dig. Mana Sys. Data Base XML files OAI Data Provider OAI Data Provider OAI Data Provider Aggregated Metadata OAI Request S E R V I C E S OAI Response OAI Request OAI Response OAI Data Provider OAI Response OAI Request O A I H A R V E S T E R Basics and Beyond
Examples of OAI Service Providers • OAIster: http://oaister.umdl.umich.edu/o/oaister/ • Engineering, Computer Science, and Physics: http://g118.grainger.uiuc.edu/engroai/ • Open Language Archives Community:http://www.language-archives.org/ Basics and Beyond
How OAI Works (Technically) Service Provider Data Provider • 6 distinct ‘verbs’ or requests • OAI requests are sent via HTTP • Responses are sent in valid XML Dig. Mngt. Sys. A G G R E G A T E D OAI H A R V E S T E R OAI Data P R O V I D E R M E T A D A T A HTTP Request (OAI Verb) HTTP Response (Valid XML) Basics and Beyond
An OAI Record - <record xmlns="http://www.openarchives.org/OAI/2.0/"> - <header> <identifier>oai:docsouth.unc.edu:12</identifier> <datestamp>2003-04-24T13:15:52Z</datestamp> <setSpec>4</setSpec> </header> - <metadata> - <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"> <title>Advice to Soldiers</title> <creator>William Royal</creator> <subject>United States -- History -- Civil War, 1861-1865 -- Religious aspects.</subject> <subject>Confederate States of America -- Religion.</subject> <subject>Soldiers -- Religious life -- Confederate States of America.</subject> <subject>Soldiers -- Confederate States of America -- Conduct of life.</subject> <subject>Confederate States of America -- Church history.</subject> <subject>Sin.</subject> <publisher>[Raleigh, N. C.: s. n., between 1861 and 1865]</publisher> <date>2003-04-24T13:15:52Z</date> <type>Text</type> <format>text/html</format> <identifier>http://docsouth.unc.edu/royal/royal.html</identifier> <language>en-us</language> </oai_dc:dc> </metadata> </record> Basics and Beyond
OAI “VERBS” Identify ListMetadataFormats ListSets ListIdentifiers ListRecords GetRecord Basics and Beyond
Identify • Purpose • Return general information about the archive and its policies (e.g., datestamp granularity) • Parameters • None • Sample URL • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=Identify Basics and Beyond
ListSets • Purpose • Provide a listing of sets in which records may be organized (may be hierarchical, overlapping, or flat) • Parameters • None Sample URL: • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListSets Basics and Beyond
ListMetadataFormats • Purpose • List metadata formats supported by the archive as well as their schema locations and namespaces • Parameters • identifier – for a specific record (O) • Sample URL • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListMetadataFormats Basics and Beyond
ListIdentifiers • Purpose • List headers for all items corresponding to the specified parameters • Parameters • from – start date (O) and/or until – end date (O) • set – set to harvest from (O) • metadataPrefix – metadata format to list identifiers for (R) • resumptionToken – flow control mechanism (X) • Sample URL • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListIdentifiers&metadataPrefix=oai_dc Basics and Beyond
GetRecord • Purpose • Returns the metadata for a single item in the form of an OAI record • Parameters • identifier – unique id for item (R) • metadataPrefix – metadata format for the record (R) • Sample URL • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=GetRecord&identifier=oai:aerialphotos.grainger.uiuc.edu:AP-1A-1-1940&metadataPrefix=oai_dc Basics and Beyond
ListRecords • Purpose • Retrieves metadata records for multiple items • Parameters • from – start date (O) • until – end date (O) • set – set to harvest from (O) • resumptionToken – flow control mechanism (X) • metadataPrefix – metadata format (R) • Sample URL • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListRecords&metadataPrefix=oai_dc Basics and Beyond
Other Pieces of OAI • Flow Control • Sets • Multiple metadata schemas Basics and Beyond
Challenges for the OAI Community • Relatively recent protocol but no best practices (yet) • ‘Shareablity of metadata’ • Heterogeneity of items described • Loss of Context / Information loss • Knowledge structures differ so…. • Native metadata schemas differ • Controlled vocabularies differ • Use and presentation of items differ Basics and Beyond
Metadata for different communities http://digital.lib.umn.edu/IMAGES/reference/mswp/MPW00476.jpg Basics and Beyond
Metadata for different communities http://images.library.uiuc.edu:8081/cgi-bin/viewer.exe?CISOROOT=/tdc&CISOPTR=746 Basics and Beyond
Loss of Context: Record in OAI aggregation Basics and Beyond
Context: Record in native database Basics and Beyond
Loss of context / data Basics and Beyond
Loss of context / data Basics and Beyond
Sense / Completeness of Metadata • identifier:http://images.umdl.umich.edu/cgi/i/image/image-idx?view=entry;subview=detail;cc=fish3ic;entryid=X-0802;viewid=1004_112 • publisher: UMMZ Fish Division • format: jpeg • type: image • subject: 1926-05-18 • subject: 1926;0812;18;Trib. to Sixteen Cr. Trib. Pine River, Manistee R.;R10W;S26; S27;JAM26-460;05;T21N;1926/05/18 • language: UND • description: Flora and Fauna of the Great Lakes Region; Basics and Beyond
Digital Image of "Cotton Coverlet with Emboridered Butterfly Design" Description:Digital image of a single-sized cotton coverlet for a bed with embroidered butterfly design. Handmade by Anna F. Ginsberg Hayutin. Source:Materials: cotton and embroidery floss. Dimensions: 71 in. x 86 in. Markings: top right hand corner has 1 1/2 in. x 1/2 in. label cut outs at upper left and right hand side for head board; fabric is woven in a variation of a rib weave; color each of yellow and gray; hand-embroidered cotton butterflies and flowers from two shades of each color of embroidery floss - blue, pink, green and purple and single top 20 in. bordered with blue and black cotton embroidery thread; stitches used for embroidery: running stitch, chain stitch, French knot and back stitches; selvage edges left unfinished; lower edges turned under and finished with large gray running stitches made with embroidery floss. Format:Epson Expression 836 XL Scanner with Adobe Photoshop version 5.5; 300 dpi; 21-53K bytes. Available via the World Wide Web. Coverage:— Date Created: 2001-09-19 09:45:18; Updated: 20011107162451; Created: 2001-04-05; Created: 1912-1920? Type:Image Granularity of Description: Excerpt of Metadata Record Describing "Cotton coverlet with embroidered butterfly design" Basics and Beyond
Granularity of Description: Excerpt of Metadata Record Describing “American Woven Coverlet” Digital Image of "American Woven Coverlet" Description:Materials: Textile--Multi, Pigment—Dye; Manufacturing Process: Weaving--Hand, Spinning, Dyeing, Hand-loomed blue wool and white linen coverlet, worked in overshot weave in plain geometric variant of a checkerboard pattern.Coverlet is constructed from finely spun, indigo-dyed wool and undyed linen, woven with considerable skill. Although the pattern is simpler, the overall craftsmanship is higher than 1934.01.0094A. - D. Schrishuhn, 11/19/99 This coverlet is an example of early "overshot" weaving construction, probably dating to the 1820's and is not attributable to any particular weaver. -- Georgette Meredith, 10/9/1973 Source:— Format:228 x 169 x 1.2 cm (1,629 g) Coverage:Euro-American; America, North; United States; Indiana? Illinois? Date:Early 19th c. CE Type:cultural; physical object; original Basics and Beyond
Range of vocabularies in use Basics and Beyond
Data providers can: • Create metadata for interoperability • Reusable metadata - think beyond your local users and environment • Use well structured and defined schemas; move beyond simple DC • Use and identify controlled vocabularies Basics and Beyond
Service Providers can… • Analyze metadata and cluster and normalize some aspects • Communicate with data providers about their metadata • Custom interfaces and selective views for target audiences / domains Basics and Beyond
Resources • OAI for beginners tutorialhttp://www.oaforum.org/tutorial/ • OAI Frequently Asked Questionshttp://www.openarchives.org/documents/FAQ.html • IMLS Digital Collections and Content Projecthttp://imlsdcc.grainger.uiuc.edu/ Basics and Beyond
Recap • OAI protocol is a tool • OAI is easy - metadata is hard • Better metadata = better interoperability Basics and Beyond
Contact Information Sarah Shreeves Project Coordinator IMLS Digital Collections and Content University of Illinois Library at Urbana-Champaign Email: sshreeve@uiuc.edu Phone: 217-244-7809 Website: http://imlsdcc.grainger.uiuc.edu/ Basics and Beyond