1 / 25

OAI Protocol for Metadata Harvesting

OAI Protocol for Metadata Harvesting. Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – http://opcit.eprints.org/ www.ecs.soton.ac.uk. BCS Metadata Meeting, London 29 th May 2002. (Many slides borrowed from Michael L. Nelson). OAI 2.0.

zorana
Download Presentation

OAI Protocol for Metadata Harvesting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – http://opcit.eprints.org/ www.ecs.soton.ac.uk BCS Metadata Meeting, London 29th May 2002 (Many slides borrowed from Michael L. Nelson)

  2. OAI 2.0 • Public, stable not released yet … (but very close) • Beta released mid-May • Public release scheduled: 1st June • 2.0 implementations in the pipeline • British Library, Cornell Univ, Ex Libris, my.OAI, Humbolt Univ, InQuirion Pty Ltd, Library of Congress, NASA, OCLC, Old Dominion Univ, U. of Illinois, U. of Southampton, UCLA, John Hopkins U., Indiana U., NYU, UKOLN, Virginia Tech

  3. The protocol is openly documented, and metadata is “exposed” to at least some peer group (note: rights management can still apply!) Archive defined as a “collection of stuff” -- not the archivist’s definition of “archive”. “Repository” used in most OAI documents. OAI is happening at break-neck speed... Open Archives Initiative

  4. Metadata Harvesting • Move away from distributed searching • Extract metadata from various sources • Build services on local copies of metadata • Resources remain at remote repositories all searching, browsing, etc. performed on the metadata here user individual nodes can still support direct user interaction search for “cfd applications” local copy of metadata metadata harvested offline metadata harvested offline metadata harvested offline metadata harvested offline each node independently maintained . . .

  5. Metadata Harvesting • Repositories (archives etc.) = low implementation cost • Services = higher implementation cost • Similar to web search model • DP9 gateway makes it exactly the same

  6. nature experimental experimental stable Dienst verbs OAI-PMH OAI-PMH requests HTTP GET/POST HTTP GET/POST HTTP GET/POST responses XML XML XML transport HTTP HTTP HTTP unqualified Dublin Core unqualified Dublin Core metadata OAMS document like objects resources about eprints metadata harvesting metadata harvesting metadata harvesting model Santa Fe convention OAI-PMH v.1.0/1.1 OAI-PMH v.2.0

  7. OAI-PMH v.2.0 [06/2002] • Goal: recurrent exchange of metadata about resources between systems • Input: • OAI-PMH v.1.0 [01/01 – 09/02] • feedback on OAI-implementers • deliberations by OAI-tech [09/01 -] • alpha test group of OAI-PMH v.2.0 [03/02 -]

  8. OAI-PMH v.2.0 [06/2002] • low-barrier interoperability specification • metadata harvesting model: data provider / service provider • metadata about resources • autonomous protocol • distinction between protocol and periphery • community-specific extensions • HTTP based • XML responses • unqualified Dublin Core • stable (1.0 characterized as experimental)

  9. resource all available metadata about David item Dublin Core metadata MARC metadata SPECTRUM metadata records OAI Data Model:Resources / Items / Records item = identifier record = identifier + metadata format + datestamp

  10. Overview of OAI Verbs archival metadata harvesting verbs most verbs take arguments: dates, sets, ids, metadata formats and resumption token (for flow control)

  11. Arguments none Errors none Arguments none Errors badArgument Identify 1.1 2.0

  12. Arguments identifier (OPTIONAL) Errors id does not exist Arguments identifier (OPTIONAL) Errors badArgument noMetadataFormats idDoesNotExist ListMetadataFormats 1.1 2.0

  13. Arguments resumptionToken (EXCLUSIVE) Errors no set hierarchy Arguments resumptionToken (EXCLUSIVE) Errors badArgument badResumptionToken noSetHierarchy ListSets 1.1 2.0

  14. Arguments from (OPTIONAL) until (OPTIONAL) set (OPTIONAL) resumptionToken (EXCLUSIVE) Errors no records match Arguments from (OPTIONAL) until (OPTIONAL) set (OPTIONAL) resumptionToken (EXCLUSIVE) metadataPrefix (REQUIRED) Errors badArgument cannotDisseminateFormat badResumptionToken noSetHierarchy noRecordsMatch ListIdentifiers 1.1 2.0

  15. Arguments from (OPTIONAL) until (OPTIONAL) set (OPTIONAL) resumptionToken (EXCLUSIVE) metadataPrefix (REQUIRED) Errors no records match metadata format cannot be disseminated Arguments from (OPTIONAL) until (OPTIONAL) set (OPTIONAL) resumptionToken (EXCLUSIVE) metadataPrefix (REQUIRED) Errors noRecordsMatch cannotDisseminateFormat badResumptionToken noSetHierarchy badArgument ListRecords 1.1 2.0

  16. Arguments identifier (REQUIRED) metadataPrefix (REQUIRED) Errors id does not exist metadata format cannot be disseminated Arguments identifier (REQUIRED) metadataPrefix (REQUIRED) Errors badArgument cannotDisseminateFormat idDoesNotExist GetRecord 1.1 2.0

  17. response no errors <?xml version="1.0" encoding="UTF-8"?> <OAI-PMH> <responseDate>2002-0208T08:55:46Z</responseDate> <request verb=“GetRecord”… …>http://arXiv.org/oai2</request> <GetRecord> <record> <header> <identifier>oai:arXiv:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> ….. </metadata> </record> </GetRecord> </OAI-PMH>

  18. response with error <?xml version="1.0" encoding="UTF-8"?> <OAI-PMH> <responseDate>2002-0208T08:55:46Z</responseDate> <request>http://arXiv.org/oai2</request> <error code=“badVerb”>ShowMe is not a valid OAI-PMH verb</error> </OAI-PMH>

  19. resumptionToken Flow-Control • Idempotency of resumptionToken: return same incomplete list when rT is re-issued • while no changes occur in the repo: strict • while changes occur in the repo: all items with unchanged datestamp • new attributes for the resumptionToken: • expirationDate • completeListSize • cursor

  20. Adoption • evolution • from talking about OAI-PMH • to talking about projects that use OAI-PMH • to talking about projects and failing to mention they use OAI-PMH • => OAI-PMH becomes part of the infrastructure

  21. Data Providers (a.k.a. repositories) • 49 registered repositories [11/2001] • 65 registered repositories [03/2002] • 77 registered repositories [05/2002] • 5+ million records • many unregistered repositories • private implementations (e.g. RDN)

  22. Service Providers • Arc: cross-searching of registered repositories [ http://arc.cs.odu.edu ] • CiteBase: research literature search + citation ranking[ http://citebase.eprints.org ] • OLAC: cross-searching of Language Archive Community repositories[ http://www.language-archives.org/index.html ]

  23. Service Providers • Scirus scientific search engine [Elsevier][ http://www.scirus.com ] • my.OAI : user-tailorable cross-searching of registered repositories [FS Consulting, Inc.][ http://www.myoai.com ] • Growing interest from web search engines

  24. OAI-PMH tools • Repository Explorer: interactive exploration of repositories [Virginia Tech][ http://www.purl.org/NET/oai_explorer ] • eprints.org: generic OAI-PMH compliant repository software [U of Southampton][ http://www.eprints.org ] • ALCME repository and harvester software [OCLC][ http://alcme.oclc.org/index.html ] • APIs, others tools @ www.openarchives.org

  25. http://www.openarchives.org/ openarchives@openarchives.org

More Related