320 likes | 428 Views
0. a brief history of the OAI. Kaynak: Herbert van de Sompel. Open Archives Initiative – Protocol for Metadata Harvesting. Yaşar Tonta HÜ BBY DOK 422: Information Networks. the OAI roots.
E N D
0 a brief history of the OAI Kaynak: Herbert van de Sompel
Open Archives Initiative – Protocol for Metadata Harvesting Yaşar Tonta HÜ BBY DOK 422: Information Networks
the OAI roots The Open Archives Initiative has been set up to create a forum to discuss and solve matters of interoperability between preprint solutions, as a way to promote their global acceptance. Paul Ginsparg, Rick Luce & Herbert Van de Sompel => Santa Fe Convention: preprint metadata harvesting Kaynak: Herbert van de Sompel
interest from other communities • Digital Library Federation meetings • ~ research library community has many materials • for which they would like to ‘expose’ metadata • OAI San Antonio meeting: • ~ interest from librarians, publishers, others, ... Kaynak: Herbert van de Sompel
resulting actions: organizational • establish organizational stability for the OAI: • institutional backing from CNI & DLF • steering committee: policy guidance • technical committee: technical specifications • executive group: day to day coordination • workshops: public dissemination, feedback Kaynak: Herbert van de Sompel
resulting actions: technical • [09/2000] revise specifications to allow adoption beyond preprints: technical committee • [09/2000-01/2001] compile new specifications: editing by Carl and Herbert • [11/2000-01/2001] alpha-test specifications: oai-alpha group • [01/2001] discontinue the Santa Fe Convention • [01/2001] release version 1.0 of the OAI protocol Kaynak: Herbert van de Sompel
1 the OAI Metadata Harvesting protocol Kaynak: Herbert van de Sompel
The OAMH protocol is a low-barrier interoperability specification for the recurrent exchange of metadata between systems Kaynak: Herbert van de Sompel
Requests repos i tory harves ter Replies the OAMH protocol service provider data provider 6 Kaynak: Herbert van de Sompel
A&I image FTXT OPAC e-print federated services Kaynak: Herbert van de Sompel
A&I image OPAC e-print harvester FTXT metadata harvesting via OAMH metadata FTXT Kaynak: Herbert van de Sompel
A&I image FTXT e-print OPAC Author Title Abstract Identifer federated services via OAMH metadata Kaynak: Herbert van de Sompel
Reply • XML Schema • Self contained core concepts in OAMH • low-barrier interoperability • data-provider & service-provider model • metadata harvesting model OAMH protocol HTTP based • shared metadata format and parallel, community-specific metadata formats Dublin Core Kaynak: Herbert van de Sompel
repos i tory harves ter OAI harvesting tools service provider data provider Datestamp Identifier Set Records Kaynak: Herbert van de Sompel
repos i tory harves ter OAI harvesting tools service provider data provider • Supporting protocol requests: • Identify • ListMetadataFormats • ListSets • Harvesting protocol requests: • ListRecords • ListIdentifiers • GetRecord Kaynak: Herbert van de Sompel
repos i tory harves ter supporting protocol requests service provider data provider ListMetadataFormats • ListMetadataFormats / Time / Request • REPEAT • Format prefix • Format XML schema • /REPEAT Kaynak: Herbert van de Sompel
repos i tory harves ter harvesting requests service provider data provider * from=a * until=b * set=klm ListRecords * metadataPrefix=dc • ListRecords / Time / Request • REPEAT • Identifier • Datestamp • Metadata • /REPEAT Kaynak: Herbert van de Sompel
Applications of the OAMH protocol? • federated services [S&R, SDI, alerting, linking, ...] • database synchronization • harvesting the deep Web • ... Kaynak: Herbert van de Sompel
background in the e-prints (pre-prints) community need to provide ‘search’ services across multiple e-prints archives distributed cross-searching felt not to be appropriate adopted approach based on metadata harvesting OAI has been linked to political agenda that wants to change the academic publishing model, but... ...core activity is the OAI-MHP - the OAI Metadata Harvesting Protocol OAI background
http://an.oai.org/ma/mini.xml static repository 1 static repository n http:// site1.org/mini/file1 Kaynak: Lagoze, http://eprints.rclis.org/archive/00000789/
http://gateway.institution.org/oai/an.oai.org/ma/mini.xml http://gateway.institution.org/oai/site1.org/mini/file1 http://an.oai.org/ma/mini.xml static repository 1 static repository gateway static repository n http:// site1.org/mini/file1 http://gateway.institution.org/oai/ Kaynak: Lagoze, http://eprints.rclis.org/archive/00000789/
OAI-PMH harvester http://an.oai.org/ma/mini.xml http://gateway.institution.org/oai/an.oai.org/ma/mini.xml static repository 1 HTTP static repository gateway OAI-PMH static repository n HTTP http://gateway.institution.org/oai/site1.org/mini/file1 http:// site1.org/mini/file1 http://gateway.institution.org/oai/ Kaynak: Lagoze, http://eprints.rclis.org/archive/00000789/
The OAI-PMH data model Kaynak: http://www.dlib.org/dlib/december04/vandesompel/12vandesompel.html
Content transfer between archives using the OAI-PMH Kaynak: http://www.dlib.org/dlib/december04/vandesompel/12vandesompel.html
‘open’ means that specs are freely available may be some formal standards activity in the future currently at version 2.0 ‘archive’ as in e-print archive - i.e. repository of documents NOT ‘archive’ as used by the library and archival communities What’s in a name?
generic protocol for sharing metadata between services NOT a distributed search protocol OAI-MHP Databases of stuff - metadata and/or full-text. May be partitioned into ‘sets’. Repositories request response Service providers
requests sent as HTTP GET responses returned as XML over HTTP OAI-MHP based on HTTP, XML, XML schemas, XML namespaces 6 requests Identify, ListIdentifiers, ListRecords, GetRecord, ListMetadataFormats, ListSets large responses may be split using simple ‘resumption token’ mechanism OAI-MHP
service provider can ask repository for all records records in particular set records modified in particular date span metadata records returned using XML support for arbitrary XML schemas repositories MUST support ‘simple DC’ XML record format some existing support for other schemas including an XML encoding for MARC Harvesting metadata