510 likes | 524 Views
Interoperability in Digital Libraries Open Archives Initiative and the NSDL. CS 502 – 20020326 Carl Lagoze – Cornell University. Acknowledgements: Bill Arms Herbert Van de Sompel. Beyond the walls.
E N D
Interoperability in Digital LibrariesOpen Archives Initiative and the NSDL CS 502 – 20020326 Carl Lagoze – Cornell University Acknowledgements: Bill Arms Herbert Van de Sompel 20020307
Beyond the walls The Library should selectively adopt the portal model for targeted program areas. By creating links from the Library’s Web site, this approach would make available the ever-increasing body of research materials distributed across the Internet. The Library would be responsible for carefully selecting and arranging for access to licensed commercial resources for its users, but it would not house local copies of materials or assume responsibility for long-term preservation. LC21: Digital Strategy for the Library of Congresspage 5 20020307
A portal should mean more than access….. • Traditional portal (e.g., Yahoo!) • linkage with limited responsibility • Hybrid Portal • Asserting some semblance of curatorial role over linked resources • Providing a rich fabric of services across those resources 20020307
Interoperability standards enable service creation…. • Search and discovery • Z39.50 • Metadata vocabularies and syntax • MARC • Dublin Core • XML/RDF • Object models • METS • FEDORA 20020307
Z39.50SGML MetadataHarvesting Cost DublinCore HTTPGoogle Functionality Interoperability Trade-offs 20020307
Yes, its about resource discovery over distributed collections metadata Author Title Abstract Identifer 20020307
Facilitating/Monitoring Longevity of Distributed Content PreservationService 20020307
View A: • View Slides • View Video • View synchronized presentation using applet • View B: • Get Transcript of Audio • Search for keyword • Get Slides translated to French Portal A Portal B Tool Repository structural metadata DigitalObject Powerpoint presentation SMIL synchronization metadata Realaudio video Personalization of Content 20020307
citation metadata citation metadata citation metadata citation metadata citation metadata Cross-Repository Reference Linking Linkage Service 20020307
Origins of the OAI • Increasing interest in alternative scholarly publishing solutions – e.g., LANL arXiv • Increasing impact through federation • UPS Mtg., Sante Fe, October 1999 • Representatives of various ePrint, library, publishing, communities • Goal: definition of an interoperability framework among ePrint providers • Result: Santa Fe Convention, interoperability through metadata harvesting 20020307
“Open” Archives • Political Agenda? • Author self-archiving of E-Prints • “Mission” to reformulate scholarly publishing framework • Technical? • Infrastructure to facilitate interoperability across multiple domains 20020307
Technical Umbrella for Practical Interoperability… Metadata Harvesting Reference Libraries Museums Publishers E-PrintArchives …that can be exploited by different communities 20020307
OAI Technical Infrastructure Key technical features • Deploy now technology – 80/20 rule • Two-party model – providers (data providers) and consumers (service providers) • Simple HTTP encoding • XML schema for some degree of protocol conformance • Extensibility • Multiple item-level metadata • Collection level metadata 20020307
Metadata harvesting The World According to OAI Service Providers Discovery Current Awareness Preservation Data Providers 20020307
Content and Metadata Item (metadata) repository resource record 010010 20020307
OAI-PMH History • Version 1.0 – January 21, 2001 • Version 1.1 – July 2, 2001 • W3C XML schema changes • Version 2.0a – March 1, 2002 • Production release – June 3, 2002 • No major functionality changes • Numerous functional tweaks • Harvesting granularity, flow control, error handling 20020307
definitions & concepts repository record identifier datestamp set protocol features HTTP encoding metadata prefix & schema flow control protocol requests supporting requests harvesting requests Key Features of the OAI Metadata Harvesting Protocol 20020307
supportdata repos i tory harves ter oai protocol items harvesting data repository 20020307
protocol support format-specificmetadata community-specificrecord data record <record> <header> <identifier>oai:eg:001</identifier> <datestamp>1999-01-01</datestamp> </header> <metadata> <dc xmlns=“http://purl.org/dc”> <title>My Example</title> </dc> </metadata> <about> <ea xmlns=“http://www.arXiv.org/ea” <usage>No restrictions</usage> </ea> </about></record> 20020307
Registered URI Scheme Unique ID within archive: (syntax is archive-specific) Archive Idendifier: Registered within OAI identifiers locally unique key for extracting a record from a repository oai-identifier = oai:archive-identifier:record-identifier example = oai:ncstrl:ncstrl.cornellcs/TR94-1418 20020307
harvest withindate range repos i tory record record selective harvesting - datestamps 20020307
S1 harvest within set repos i tory record record record selective harvesting - sets S2 20020307
set specifics • repositories define hierarchical organization • each item in a repository may be organized in one set, several sets, or no sets at all • meaning of sets or of set hierarchy is not defined in protocol • individual communities may formulate common set configurations 20020307
HTTP encoding - requests BASE-URL -----------> an.oa.org/OAI-scriptkeyword arguments --> verb=ListIdentifers&set=S1 GET http://an.oa.org/OAI-script?verb=ListIdentifers&set=S1 POST POST http://an.oa.org/OAI-script HTTP/1.0 Content-Length: 78 Content-Type: application/x-www-form-urlencoded verb=ListIdentifers&set=S1 20020307
xml namespaces responseheader responsedata HTTP encoding - responses <xml version=1.0 encoding=“UTF-9” ?><GetRecord xmlns=“http://oai.namespace.uri” xmlns:xsi=“http://w3.namespace.uri” xsi:schemaLocation=“http://oai.namespace.uri http://oai.schemaURL”> <responseDate>2000-19-01T19:30:30-04:00</responseDate> <requestURL>http://an.oa.org/OAI-script?verb=GetRecord &identifier=oai%3AarXiv%3A0001 &metadataPrefix=oai_dc</requestURL> <record>record contents </recordadditional records</GetRecord> 20020307
metadata prefix and schema • support for harvesting multiple metadata formats • metadata schema: each format must have a validating XML schema at a publicly accessible URL (communities may define shared formats and schema. • metadata prefix: each repository maps a prefix to the schema it supports, which is used in protocol requests. • support for unqualified Dublin Core mandatory • DC OAI record syntax that builds on base DCMI schema • reserved prefix oai_dc. 20020307
protocol request harves ter repos i tory flow control 20020307
flow control specifics • applies to all protocol requests that return lists: ListRecords, ListIdentifiers, ListSets • resumptionToken is opaque • semantics of partitioning of responses within resumption requests is undefined 20020307
Extensibility Feature Summary • Multiple metadata formats • Collection level metadata • Identify “about” container • Record data • Terms and conditions • Provenance • Set structure • Pre-configured “queries” 20020307
repos i tory harves ter OAI Protocol service provider data provider • Supporting protocol requests: • Identify • ListMetadataFormats • ListSets • Harvesting protocol requests: • ListRecords • ListIdentifiers • GetRecord 20020307
repos i tory harves ter Supporting Protocol Requests service provider data provider Identify • Repository name • Base-URL • Admin e-mail • OAI protocol version • Description Container 20020307
repos i tory harves ter Supporting Protocol Requests service provider data provider ListMetadataFormats • REPEAT • Format prefix • Format XML schema • /REPEAT 20020307
repos i tory harves ter Supporting Protocol Requests service provider data provider ListSets • REPEAT • Set Specification • Set Name • /REPEAT 20020307
repos i tory harves ter Harvesting Protocol Requests service provider data provider * from=a * until=b * set=klm ListRecords * metadataPrefix=oai_dc • REPEAT • Identifier • Datestamp • Metadata • About Container • /REPEAT 20020307
repos i tory harves ter Harvesting Protocol Requests service provider data provider * from=a * until=b ListIdentifiers * set=klm • REPEAT • Identifier • Datestamp • /REPEAT 20020307
repos i tory harves ter Harvesting Protocol Requests service provider data provider * identifier=oai:mlib:123a GetRecord * metadataPrefix=oai_dc • Identifier • Datestamp • Metadata • About 20020307
Measures of Success • >100 implementers of the protocol • 64 registered • Basis for much research and implementation • JCDL 2002 • A subject category for paper submission! • Numerous papers building on OAI • Research Projects and Funding 20020307
Externally funded initiatives • European Community • Open Archives Forum • Cyclades Project • Andrew W. Mellon Foundation • Funding for 7 service providers • Digital Library Federation • Gateways for access to member's digital collecitons • National Science Foundation • National Science Foundation Core Infrastructure 20020307
DP9 Architecture • Giving search engines access to the “deep web” 20020307
NSDL (National Digital Library for Science, Mathematics, and Engineering ) • Large-scale digital library technology • 1,000,000 users • 10,000,000 items • 100,000 collections • Diverse participants • Libraries • Academic/research institutions • Individuals 20020307
NSDL References • http://comm.nsdlib.org/ • Zia, L.,Growing a National Learning Environments and Resources Network for Science, Mathematics, Engineering, and Technology Education, D-Lib, March 2001 • Arms, W. et. al., A Spectrum of Interoperability: The Site for Science Prototype for the NSDL, D-Lib, January 2002 • Lagoze, C. et. Al., Core Services in the Architecture of the NSDL, JCDL 2002, July 2002. 20020307
The Challenge Provide coherent services for users across diverse collections, while retaining the individuality and richness of the collections. 20020307
The strategy • A Spectrum of Interoperability • Open framework for collections & services • Embrace collections with rich metadata & support for standards, ... accommodate collections with limited metadata & limited support for interoperability. • Technical basis • Follow library tradition of metadata sharing • Use automated methods to generate, normalize, & translate metadata • Distribute metadata to service providers 20020307
The Metadata Repository Services Users Metadata repository The metadata repository is a resource for service providers. It holds information about every collection and item known to the NSDL. Collections 20020307
Metadata repository MR Ingest and Exposure OAI-PMH OAI-PMH Normalization Generation Cross-walking MR Front Porch OAI-PMH gathering Directentry 20020307