1 / 62

Implementation of Digital Libraries: Core Technologies and Issues

Learn about the implementation of digital libraries, including core technologies like OAI-PMH, deep web, OpenURLs, and object models. Explore example implementations and the mechanics of OAI-PMH in this informative presentation.

Download Presentation

Implementation of Digital Libraries: Core Technologies and Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementation of Digital LibrariesMichael L. NelsonOld Dominion Universitymln@cs.odu.eduhttp://www.cs.odu.edu/~mln/ Congreso Internacional de Información en Salud Lima, Peru May 28, 2004

  2. Acknowledgements • ODU: K. Maly, M. Zubair, J. Bollen • LANL: R. Luce, X. Liu • NASA: G. Roncaglia, J. Rocker, C. Mackey • Cornell: C. Lagoze, S. Warner • MAGiC (UK): Paul Needham • and, of course, Herbert Van de Sompel (LANL) • the OpenURL slides are nicked from his presentations

  3. Outline • A bit of history • Core technologies & Issues • OAI-PMH • deep web • OpenURL • Handles / DOIs • Object Models • Example implementations • Download and go… covered only briefly

  4. OAI-PMH

  5. Background • I met Herbert Van de Sompel in April 1999... • we spoke of a demonstration project he had in mind and had received sponsorship from Paul Ginsparg and Rick Luce • We wanted to demonstrate a multi-disciplinary DL that leveraged the large number of high quality, yet often isolated, tech report servers, e-print servers, etc. • most digital libraries (DLs) had grown up along single disciplines or institutions • little to no interoperability; isolated DL “gardens” • Universal Preprint Service • Demonstrated at Santa Fe NM, October 21-22, 1999 • http://web.archive.org/web/*/http://ups.cs.odu.edu/ • D-Lib Magazine, 6(2) 2000 (2 articles) • http://www.dlib.org/dlib/february00/02contents.html • UPS was soon renamed the Open Archives Initiative (OAI) http://www.openarchives.org/

  6. Result… OAI • The OAI was the result of the demonstration and discussion during the Santa Fe meeting • OAI = a bunch of people, a religion, a cult, etc. • OAI Protocol For Metadata Harvesting (OAI-PMH) = the protocol created and maintained by the OAI • Initial focus was on federating collections of scholarly e-print materials… • …however, interest grew and the scope and application of OAI-PMH expanded to become a generic bulk metadata transport protocol • Note: • OAI-PMH is only about metadata -- not full text! • but what is metadata vs. full-text? • OAI is neutral with respect to the nature of the metadata or the resources the metadata describes • read: commercial publishers have an interest in OAI-PMH too...

  7. Request is encoded in http OAI-PMH Mechanics Response is encoded in XML XML Schema for the responses are defined in the OAI-PMH document

  8. Overview of OAI-PMH Verbs archival metadata harvesting verbs most verbs take arguments: dates, sets, ids, metadata formats and resumption token (for flow control)

  9. set-membership is item-level property resource all available metadata about David item Dublin Core metadata MARC metadata SPECTRUM metadata records OAI-PMH Data Model item = identifier record = identifier + metadata format + datestamp

  10. service providers (harvesters) data providers (repositories) Data Providers / Service Providers

  11. Aggregators • aggregators allow for: • scalability for OAI-PMH • load balancing • community building • discovery service providers (harvesters) data providers (repositories) aggregator

  12. Aggregators • Frequently interchangeable terms: • aggregators: likely to be community / institutionally focused • caches: stores a copy, less likely to be community-oriented • proxies: less likely to store a copy, may gateway between OAI-PMH and other protocols • Dienst / OAI Gateway; Harrison, Nelson, Zubair, JCDL 03 • To learn more about aggregators, caches & proxies: • http://www.openarchives.org/OAI/2.0/guidelines-aggregator.htm • http://www.cs.odu.edu/~mln/jcdl03/

  13. Example Aggregators • Arc - http://arc.cs.odu.edu/ • first described “hierarchical harvesting” in D-Lib Magazine, 7(4) 2001 • http://www.dlib.org/dlib/april01/liu/04liu.html • Celestial - http://celestial.eprints.org/ • among other services, it provides a history of harvests (successful vs. errors) • http://celestial.eprints.org/cgi-bin/status

  14. OAI-PMH 2.0 Registration • unregistered because: • testing / development • not for public harvesting • public, but “low-profile” • never got around to it… • ??? ??? unregistered repositories 150+ repositories registered DP:SP ~= 5:1 Data Providers: http://www.openarchives.org/Register/BrowseSites.pl Service Providers: http://www.openarchives.org/service/listproviders.html

  15. Registration is Nice……But Not Required • OAI-PMH is (becoming) the “http” for digital libraries • there is no central registry of http servers • remember the NCSA “What’s New” page? (ca. 1994) • There will never be “registration support” in OAI-PMH • registries are a type of service provider, built on top of OAI-PMH • registration will be an integral part of community building • friends…

  16. harvester Identify <friends>…</friends> http://techreports.larc.nasa.gov/ltrs/oai2.0/ http://naca.larc.nasa.gov/oai2.0/ http://ston.jsc.nasa.gov/collections/TRS/oai/ http://ntrs.nasa.gov/oai2.0/ http://horus.riacs.edu/perl/oai/ NASA <friends> example

  17. NACA Technical Report Server • publicly available • began in 1996 • details in NASA TM-1999-209127 • scanned reports from 1917-1958 • NACA = predecessor to NASA • contents mirrored with the MaGIC project • a UK-based grey-literature preservation project • OAI-PMH used to mirror contents http://naca.larc.nasa.gov/ http://naca.larc.nasa.gov/oai2.0/

  18. NACA Report 1345 as seen through its native DL http://naca.larc.nasa.gov/

  19. NACA Report 1345 as seen through MAGiC http://www.magic.ac.uk/

  20. NACA Report 1345 as seen through its Scirus (Elsevier) http://www.scirus.com/

  21. NACA Report 1345 as seen through my.OAI (FS Consulting) http://www.myoai.com/

  22. NASA Technical Report Server • replacement for the previous distributed searching version of NTRS • MySQL • Va Tech harvester • modified “bucket” • details in Nelson, Rocker, Harrison, Library Hi-Tech, 21(2) (March 2003) • a service provider & aggregator • same OAI baseURL as used for interactive searching http://ntrs.nasa.gov/

  23. NASA Technical Report Server • advanced, fielded search • explicit query routing • 12 NASA repositories • 4 non-NASA repositories • turned “off” by default • >600k abstracts; >300k full-text

  24. Service Providers • It is clear that SPs are proliferating, despite (because of?) the inherent bias toward DPs in the protocol • easy to be a DP -> many DPs -> SPs eventually emerge • hard to be a DP -> SPs starve • currently 5x DPs more than SPs • SPs are beginning to offer increasingly sophisticated services • competitive market originally envisioned for SPs is emerging

  25. Universidad Nacional Mayor de San Marcos Colegio America Pontificia Universidad Catolica del Peru Universidad Nacional Federico Villarreal Colegio Universitario Andino Universidad Nacional de Trujillo Universidad del Pacifico Universidad de Lima Universidad Peruana de Ciencias Apicadads Universidad Nacional Jorge Basadre Grohmann Community Building www.ndltd.org

  26. OAI-PMH & The Deep Web

  27. Exposing Repository Contents • DP9: Webcrawler access to OAI-PMH repositories • http://dlib.cs.odu.edu/dp9/ • JCDL 02 http://www.cs.odu.edu/~liu_x/dp9/dp9.pdf • An Apache module for OAI-PMH • http://www.modoai.org/ • Extensible Repository Resource Locators (ERRoLs) for OAI Identifiers • http://www.oclc.org/research/projects/oairesolver/default.htm

  28. Race for This New Market… • Yahoo! & University of Michigan • http://www.umich.edu/news/index.html?Releases/2004/Mar04/r031004 • Google & CrossRef • http://www.nature.com/nature/focus/accessdebate/17.html

  29. OpenURL slides from Herbert Van de Sompel, LANL

  30. Origins & Motivation • The Context: Library Automation Environment anno 1998 • distributed information environment • local & remote A&I databases • rapidly growing e-journal collection • need to interlink the available information • The Problem: • links are delivered by info providers • links are not sensitive to user’s context • appropriate copy problem • links dependent on business agreements between information vendors • links don’t cover the complete collection

  31. Origins & Motivation • The Context: Library Automation Environment anno 1998 • distributed information environment • local & remote A&I databases • rapidly growing e-journal collection • need to interlink the available information • The REAL Problem: • libraries have no say in linking • libraries are losing core part of the “organizing information” task • expensive collection is not used optimally • users are not well served

  32. Origins & Motivation • The Solution: • In information services: • DO NOT provide a link which is an actual service related to a referenced item (e.g. a link from a record in an A&I database to the corresponding full-text) • BUT rather provide • a link that transports metadata about the referenced item • to • others that are better placed to provide service links OpenURL Linking server operated by library

  33. link source link destination link non-OpenURL linking resource resource . link to referenced work reference resolution of metadata into link

  34. link link link link link destination link destination link destination link destination OpenURL link source linking server OpenURL OpenURL linking transportation of metadata & identifiers user-specific . reference context-sensitive resolution of metadata & identifiers into services provision of OpenURL

  35. default links • default links: • restricted in nature • action-radius restricted by business agreements • not context-sensitive resource2 resource3 resource1 metadata plane herbert van de sompel

  36. appropriate links OpenURL default links extended services plane service component1 service component2 resource2 resource3 resource1 metadata plane herbert van de sompel

  37. NISO OpenURL Standardization Charge • Use existing “OpenURL Framework” as starting point • notion of context-sensitive services • notion of transporting “contextual” metadata packages to obtain context-sensitive services • Define syntax and transport-method for “contextual” metadata packages • Ensure extensibility: • must support future applications • must support other information communities • => Generalize and Standardize

  38. NISO OpenURL Standardization Charge • Therefore, to be addressed were: • OpenURL Framework beyond scholarly resources • “contextual” metadata packages • Syntax for “contextual” metadata packages • Transport of “contextual” metadata packages

  39. OpenURL Status • (Nearly) a NISO standard • check for details: • http://library.caltech.edu/openurl/

  40. Naming: Handles & DOIs

  41. Naming • Fundamental to other technologies (OAI-PMH, OpenURL, etc.) • Options • URNs • Persistent URLs (PURLs) • http://purl.org/ • Handles • http://www.handle.net/ • Digital Object Identifiers • http://www.doi.org/ • ARK • http://www.cdlib.org/inside/diglib/ark/

  42. “Inverted Archives” • Unit of discourse is no longer an archive or service, but a DOI which has services linked from it • cf.: • UPS demonstration prototype • “Smart Objects, Dumb Archives” (SODA) model

  43. Example http://dx.doi.org/10.1145/374308.374342

  44. Object Models

  45. Popular Object Models • METS • used in DSpace, Fedora • http://www.loc.gov/standards/mets/ • MPEG-21 DIDL • http://xml.coverpages.org/mpeg21-didl.html • used in LANL DLs • http://www.dlib.org/dlib/november03/bekaert/11bekaert.html • http://www.dlib.org/dlib/february04/bekaert/02bekaert.html • http://lib-www.lanl.gov/~herbertv/papers/jcdl2004-submitted-draft.pdf

  46. Object Models & OAI-PMH resource item oai:foo.edu:1234 records METS Move from simple metadata files “pointing” to resources… …to records as “modeled representations” of resources

  47. Download and Go!

  48. Where Do You Want to Build? user CDSware service provider data provider data provider data provider data provider data provider . . . local context- sensitive services EPrints.org CDSware

  49. Fedora • joint project between Cornell & UVa • funded by the Mellon Foundation • a repository management system • focuses on complex digital objects and their behaviors • more info: • http://www.fedora.info/ • D-Lib Magazine, 9(4) • http://www.dlib.org/dlib/april03/staples/04staples.html

  50. MIT + HP Labs • constructed to capture all the output of MIT’s faculty • now generalized to the DSpace Federation • 8 top universities in the US & Canada • More info: • http://www.dspace.org/ • http://sourceforge.net/projects/dspace/ • D-Lib Magazine 9(1) • http://www.dlib.org/dlib/january03/smith/01smith.html

More Related