360 likes | 434 Views
the OAI Protocol for Metadata Harvesting an update. H erbert V an de S ompel Los Alamos National Laboratory – Research Library.
E N D
the OAI Protocol for Metadata Harvesting an update Herbert Van de Sompel Los Alamos National Laboratory – Research Library
The Open ArchivesInitiative has been set up to create a forum to discuss and solve matters of interoperability between preprint solutions, as a way to promote their global acceptance. Paul Ginsparg, Rick Luce & Herbert Van de Sompel
2 core motivations • as a systems librarian: change the system • as a researcher: find (technical) ways to facilitate the change
P U B D I S L I B A R as a systems librarian optimizing the output the input is far from optimal
eprint systems • xxx e-print archive • (Physics - 1991 - Los Alamos - Ginsparg) • RePEc • (Economy - Surrey U - Krichel) • NCSTRL • (Computer Science - Cornell U - Lagoze) • NDLTD • (Theses - Virginia Tech - Fox) • CogPrints • (Cognitive Sciences - Southampton U - Harnad)
as a researcher • eprints are attractive building block in ongoing transformation of scholarly communication • but: interoperability could increase impact of e-prints: • amongst e-print solutions • with building blocks that implement other functions of scholarly communication • with the established communication system
UPS Prototype: eprints discovery • 1999: Van de Sompel, Krichel, Nelson • results: • insights regarding how un-interoperable the systems were • a cross-repository searching and linking service • recommendations to the Santa Fe meeting: • data provider / service provider model • metadata harvesting • simplicity
evolution towards OAI-PMH v.2.0 • Santa Fe Convention [02/2000] • OAI-PMH 1.0 [01/2001] • OAI-PMH 2.0 [06/2002]
nature experimental experimental stable Dienst verbs OAI-PMH OAI-PMH requests HTTP GET/POST HTTP GET/POST HTTP GET/POST responses XML XML XML transport HTTP HTTP HTTP unqualified Dublin Core unqualified Dublin Core metadata OAMS document like objects resources about eprints metadata harvesting metadata harvesting metadata harvesting model Santa Fe convention OAI-PMH v.1.0/1.1 OAI-PMH v.2.0
Requests repos i tory harves ter Replies OAI-PMH model service provider data provider 6 OAI-PMH
repos i tory harves ter OAI-PMH model service provider data provider • Supporting protocol requests: • Identify • ListMetadataFormats • ListSets • Harvesting protocol requests: • ListRecords • ListIdentifiers • GetRecord
repos i tory harves ter OAI-PMH model service provider data provider Datestamp Identifier Set Records
A&I image FTXT OPAC e-print federated services
A&I image OPAC e-print harvester FTXT metadata harvesting via OAI-PMH metadata FTXT
A&I image FTXT e-print OPAC Author Title Abstract Identifer metadata harvesting via OAI-PMH metadata
issue solved? • no, just a tiny part of the technical challenges to support discovery • many more technical issues • even more non-technical issues
A R interoperable grid issue solved? technical awareness certification rewarding registration archiving
issue solved? non-technical • I am happy to leave those to you • but: even for non-technological issues, part of the answer might be found in applying technology
indicators of adoption of OAI-PMH • data providers • service providers • tools • structural support
data providers • 49 registered repositories [11/2001] • 65 registered repositories [03/2002] • 5+ million records • many unregistered repositories
service providers • Arc : cross-searching of registered repositories [Old Dominion U] • [ http://arc.cs.odu.edu ] • OLAC: cross-searching of Language Archive Community repositories • http://www.language-archives.org/index.html
service providers • Scirus scientific search engine [Elsevier] • [ http://www.scirus.com ] • my.OAI : user-tailorable cross-searching of registered repositories [FS Consulting, Inc.] • [http://www.myoai.com] • growing interest from web search engines
OAI-PMH tools • Repository Explorer: interactive exploration of repositories [Virginia Tech] • [ http://www.purl.org/NET/oai_explorer ] • eprints.org: generic OAI-PMH compliant repository software [U of Southampton] • [ http://www.eprints.org ] • ALCME repository and harvester software [OCLC] • [ http://alcme.oclc.org/index.html ]
OAI-PMH flies: structural support • Metadata Harvesting Initiative of the Mellon Foundation • NSDL (NSF funded) • UK FAIR call for proposals to support disclosure of institutional assets (papers, learning materials, etc.) • Institute for Museum and Library Services • several EC projects exploring/supporting usage of OAI-PMH: TEL, Leaf, Cyclades, OA Forum
OAI-PMH flies: and also … • Australian Museums Online & CIMI : OAI conference • NIMH white paper on data archiving for Animal Cognition Research • Library of Congress • National Library of Canada • OCLC thesis database • Illinois State Library Catalogue
future • OAI • OAI-PMH • communities • adoption
the OAI-PMH • release of OAI-PMH v.2.0 [06/2002] • no backwards compatibility with v.1.0/1.1 • stable • migration process for registered repos • ? formal standardization ? • ? SOAP version ~ web services framework [SOAP, WSDL, UDDI] ?
communities • proliferation of community-specific add-ons for: • collection & set level metadata • expressive metadata formats (e.g. qualified DC XML Schema) • shared set-structures • machine readable rights (about the metadata)
adoption • evolution • from talking about OAI-PMH • to talking about projects that use OAI-PMH • to talking about projects and failing to mention they use OAI-PMH • => OAI-PMH becomes part of the infrastructure
I just wanted to report what I consider an OAI success. I discovered that RLG had harvested records for two of the American Memory collections I had made available and integrated them into their Cultural Materials Initiative service without the need for a single e-mail or phone call. They reported that it was working very well for them. [Caroline Arms, Library of Congress]
http://www.openarchives.org openarchives@openarchives.org
the OAI: not really an organization • Executive: Carl Lagoze & Herbert Van de Sompel • 2000 – 2002 funding from CNI and DLF • Steering Committee • Technical Committe: • protocol revision & stabilization • Alpha testers
OAI-tech US representatives Thomas Krichel (Long Island U) - Jeff Young (OCLC) - Tim Cole - (U of Illinois at Urbana Champaign) - Hussein Suleman (Virginia Tech) - Simeon Warner (Cornell U) - Michael Nelson (NASA) - Caroline Arms (LoC) - Muhammad Zubair (Old Dominion U) - Steven Bird (U Penn.) European representatives Andy Powell (Bath U. & UKOLN) - Mogens Sandfaer (DTV) - Thomas Baron (CERN) - Les Carr (U of Southampton)
OAI-PMH 2.0 alpha testers (1/2) • The British Library • Cornell U. -- NSDL project & e-print arXiv • Ex Libris • FS Consulting Inc -- harvester for my.OAI • Humboldt-Universität zu Berlin • InQuirion Pty Ltd, RMIT University • Library of Congress • NASA • OCLC
OAI-PMH 2.0 alpha testers (2/2) • Old Dominion U. -- ARC , DP9 • U. of Illinois at Urbana-Champaign • U. Of Southampton -- OAIA, CiteBase, eprints.org • UCLA, John Hopkins U., Indiana U., NYU -- sheet music collection • UKOLN, U. of Bath -- RDN • Virginia Tech -- repository explorer