360 likes | 457 Views
U.S. Government Use of the OAI-PMH. Michael L. Nelson Old Dominion University Norfolk Virginia, USA mln@cs.odu.edu http://www.cs.odu.edu/~mln/. Indo-US Workshop on Open Digital Libraries and Interoperability Arlington, VA - June 23-25, 2003. Acknowledgements.
E N D
U.S. Government Use of the OAI-PMH Michael L. Nelson Old Dominion University Norfolk Virginia, USA mln@cs.odu.edu http://www.cs.odu.edu/~mln/ Indo-US Workshop on Open Digital Libraries and Interoperability Arlington, VA - June 23-25, 2003
Acknowledgements • ODU: K. Maly, M. Zubair, J. Bollen, X. Liu • LANL: R. Luce, X. Liu • NASA: G. Roncaglia, J. Rocker • MAGiC (UK): P. Needham
Outline • Review: • OAI-PMH • data provider / service provider model • including “aggregators” • Role of registration for repositories • NASA projects • OSTI demo project • Technical Report Interchange (TRI) • NASA, DOE, DOD
Disclaimer: Scientific and Technical Information (STI) • This talk will cover US Government focused / sponsored STI only • This talk will not cover American Memory • a cultural history project from the Library of Congress (LoC) • http://memory.loc.gov/ • the LoC played a significant role in the definition and early adoption of the OAI-PMH
Acronym Review LANL = Los Alamos National Laboratory Sandia = Sandia National Laboratory LaRC = Langley Research Center AFRL = Air Force Research Laboratory NASA Department of Energy Department of Defense CASI (Center for AeroSpace Information) http://www.sti.nasa.gov/ OSTI (Office of Scientific and Technical Information) http://www.osti.gov/ DTIC (Defense Technical Information Center) http://www.dtic.mil/
The Rise and Fall of Distributed Searching • wholesale distributed searching, popular at the time, is attractive in theory but troublesome in practice • Davis & Lagoze, JASIS 51(3), pp. 273-80 • Powell & French, Proc 5th ACM DL, pp. 264-265 • distributed searching of N nodes still viable, but only for small values of N • NCSTRL: N > 100; bad • NTRS/NIX: N<=20; ok (but could be better)
set-membership is item-level property resource all available metadata about David item Dublin Core metadata MARC metadata SPECTRUM metadata records resource – item - record item = identifier record = identifier + metadata format + datestamp
Overview of OAI-PMH Verbs metadata about the repository harvesting verbs most verbs take arguments: dates, sets, ids, metadata formats and resumption token (for flow control)
service providers (harvesters) data providers (repositories) Data Providers / Service Providers
Aggregators • aggregators allow for: • scalability for OAI-PMH • load balancing • community building • discovery service providers (harvesters) data providers (repositories) aggregator
Aggregators • Frequently interchangeable terms: • aggregators: likely to be community / institutionally focused • caches: stores a copy, less likely to be community-oriented • proxies: less likely to store a copy, may gateway between OAI-PMH and other protocols • Dienst / OAI Gateway; Harrison, Nelson, Zubair, JCDL 03 • To learn more about aggregators, caches & proxies: • http://www.openarchives.org/OAI/2.0/guidelines-aggregator.htm • http://www.cs.odu.edu/~mln/jcdl03/
Example Aggregators • Arc - http://arc.cs.odu.edu/ • first described “hierarchical harvesting” in D-Lib Magazine, 7(4) 2001 • http://www.dlib.org/dlib/april01/liu/04liu.html • Celestial - http://celestial.eprints.org/ • among other services, it provides a history of harvests (successful vs. errors) • http://celestial.eprints.org/cgi-bin/status
OAI-PMH 2.0 Registration • unregistered because: • testing / development • not for public harvesting • public, but “low-profile” • never got around to it… • ??? ??? unregistered repositories 75 repositories registered Data Providers: http://www.openarchives.org/Register/BrowseSites.pl Service Providers: http://www.openarchives.org/service/listproviders.html DP:SP ~= 5:1
Registration is Nice……But Not Required • OAI-PMH is (becoming) the “http” for digital libraries • there is no central registry of http servers • remember the NCSA “What’s New” page? (ca. 1994) • There will never be “registration support” in OAI-PMH • registries are a type of service provider, built on top of OAI-PMH • registration will be an integral part of community building • friends…
<friends> • A light weight, optional, DP-centric method to communicate the existence of “others” http://techreports.larc.nasa.gov/ltrs/oai2.0/?verb=Identify .. <description> <friends ..namespace stuff..> <baseURL>http://naca.larc.nasa.gov/oai2.0</baseURL> <baseURL>http://ntrs.nasa.gov/oai2.0</baseURL> <baseURL>http://horus.riacs.edu/perl/oai/</baseURL> <baseURL>http://ston.jsc.nasa.gov/collections/TRS/oai/</baseURL> </friends> </description> ..
harvester Identify <friends>…</friends> http://techreports.larc.nasa.gov/ltrs/oai2.0/ http://naca.larc.nasa.gov/oai2.0/ http://ston.jsc.nasa.gov/collections/TRS/oai/ http://ntrs.nasa.gov/oai2.0/ http://horus.riacs.edu/perl/oai/ NASA<friends>example
Use of <friends> Slide from S. Warner, Cornell University
Langley Technical Report Server • publicly available • began as an anonymous ftp server in 1992; http access in 1993 • model for other technical report servers at other NASA centers • details in NASA TM-109162 • mostly LaTeX, MS Word, other systems • some scanned reports http://techreports.larc.nasa.gov/ltrs/ http://techreports.larc.nasa.gov/ltrs/oai2.0/
NACA Technical Report Server • publicly available • began in 1996 • details in NASA TM-1999-209127 • scanned reports from 1917-1958 • NACA = predecessor to NASA • contents mirrored with the MaGIC project • a UK-based grey-literature preservation project • OAI-PMH used to mirror contents http://naca.larc.nasa.gov/ http://naca.larc.nasa.gov/oai2.0/
NACA Report 1345 as seen through its native DL http://naca.larc.nasa.gov/
NACA Report 1345 as seen through MAGiC http://www.magic.ac.uk/
NACA Report 1345 as seen through its Scirus (Elsevier) http://www.scirus.com/
NACA Report 1345 as seen through OAIster http://oaister.umdl.umich.edu/
NACA Report 1345 as seen through my.OAI (FS Consulting) http://www.myoai.com/
NTRS OAI Architecture all searching, browsing, etc. performed on the metadata here user individual nodes can still support direct user interaction search for “cfd applications” NTRS local copy of metadata metadata harvested offline, through OAI interface each node independently maintained . . . LTRS ATRS GTRS CASITRS content (reports) remain archived at the local sites
NASA Technical Report Server • publicly available • replacement for the former distributed searching version of NTRS • MySQL • Va Tech harvester • modified “bucket” • details in Nelson, Rocker, Harrison, Library Hi-Tech, 21(2) (July 2003) • a service provider & aggregator • same OAI-PMH baseURL as used for interactive searching http://ntrs.nasa.gov/
NASA Technical Report Server • advanced, fielded search • explicit query routing • 12 NASA repositories • 4 non-NASA repositories • turned “off” by default
non-NASA repositories > 0.5M records
NTRS … CASITRS LTRS ATRS NASA DLs in the Larger STI Realm DOE Publishers Universities DOD International . . . this could be a fully connected graph NTRS could also be a data provider from the point of view of other DLs; allowing the harvesting of NASA report metadata. NTRS could also harvest metadata from other DLs, and provide access to non-NASA content. We hope to influence the direction of the science.gov effort to use OAI-PMH
OSTI Energy Citations Database • OAI-PMH support just recently added (Feb 2003) • not yet officially announced or registered • 20k records, 8k full-text • other OSTI collections planned http://www.osti.gov/energycitations/
Technical Report Interchange • Goal: share technical reports between 4 US government labs without creating new digital libraries for users to learn! • NASA Langley Research Center • Air Force Research Laboratory • Los Alamos National Laboratory (DOE) • Sandia National Laboratory (DOE) • Solution: use cooperating OAI-PMH caches at each site to • export local contents • ingest remote contents
TRI Production System - Status LaRC TRI System LANL TRI System Sandia TRI System AFRL TRI System ODU TRI System (Listener) Records coming in from other TRI systems Records going out to other TRI systems In Production Proposed Slide from M. Zubair, ODU
Mappings in TRI Details in Liu, et al. ECDL 2002; the above table also taken from the same paper
A Single TRI Module Slide from M. Zubair, ODU
The Future: Community Building • Ultimately, protocols and metadata formats are not what makes a difference • Rather, the critical mass afforded by a common set of utilities (cf. http, Dublin Core, XML) • The best current example: The Open Language Archives Community • http://www.language-archives.org/ • OAI-PMH provides the basis for communication between strangers, but allows even richer communication between friends
STI Communities • Government produced/sponsored STI • http://ntrs.nasa.gov/ • http://www.osti.gov/energycitations/ • http://dlib.cs.odu.edu/tri/ • Academia • self-archiving vs. institutional archives • http://www.soros.org/openaccess/ • http://www.ecs.soton.ac.uk/~harnad/Tp/resolution.htm • Commercial publishers • e.g. BioMed Central • http://www.biomedcentral.com/