1 / 39

Building Reliable Distributed Information Spaces

Explore the evolution and characteristics of digital libraries, focusing on organization, access, and trade-offs. Learn about the National Science Digital Library and strategies for building a large-scale digital library with limited resources.

williamsi
Download Presentation

Building Reliable Distributed Information Spaces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Reliable Distributed Information Spaces Carl Lagoze CS 430 10/22/2002

  2. Functions Selection Access Organization User support Preservation Characteristics Standardized Professionalized Service-oriented In it for the long-haul Conservative Trustworthy Expensive (human centric) Characteristics of a library

  3. Perspective on the Budget

  4. Library in current environment • “I don’t do libraries” – anonymous Cornell undergrad to Bob Constable • How do you use the library? • Go to the library to study? • Go to the library to do research? • Talked to a reference librarian? • Use the library gateway or electronic resources?

  5. Characteristics of the Web • Decentralized/Anarchic/Illegal • Agreements are technical (at best) • Roles are undefined and fluid • Immediate • Ephemeral • Integrity not established • Anonymous (or “no one knows you are a dog”)

  6. What is a Digital Library? Evolutionary perspective: digital libraries as institutions that are the continuation of libraries (library automation and digitization as the link between libraries and digital libraries). Revolutionary perspective: digital libraries as technical/organizational/economic/legal layers on top of networked information (the Web) that render existing libraries obsolete.

  7. What is a Digital Library? A digital library is a managed collection of information, with associated services, where the information is stored in digital formats and is accessible over a network.[Arms CS502 sp00]

  8. economy technology sociology law Many facets of the problem/solution

  9. Cost Functionality Technical Trade-offs

  10. National Science Digital Library(NSDL) • Goal: Reform science education in the US in the digital age • $25M in funding 2002-2006 • Over 80 institutional grants for collections, services, core infrastructure (technical, economic, organizational) • Cornell is primary technical development partner • Carl Lagoze, Director of Technology • http://www.nsdl.org

  11. browsing annotating searching Open Access Web NSF-funded Collections Publishers filtering quality rating curriculum building Building service and knowledge layers over a variety of resources for a variety of users

  12. How Big might the NSDL be? All branches of science, all levels of education, very broadly defined: Five year targets • 1,000,000 different users • 10,000,000 digital objects • 10,000 to 100,000 independent sites

  13. Core Integration Philosophy It is possible to build a very large digital library with a small staff. But ... • Every aspect of the library must be planned with scalability in mind. • Some compromises will be made. • Lots of standard library functions must be automated.

  14. Resources for Core Integration Core Integration Budget $4-6 million Staff 25 - 30 Management Diffuse How can a small team, without direct management control, create a very large-scale digital library?

  15. Collections: the Basic Assumption The Core Integration team will not manage any collections

  16. The NSDL program funds only a fraction of the relevant collections. Collections

  17. Every Collection is Different

  18. The Core Integration Task ... ... to provide a coherent set of collections and services across great diversity.

  19. Interoperability The Problem Conventional approaches to interoperability require partners to support agreements (technical, content, and business But NSDL needs thousands of very different partners ... most of whom are not directly part of the NSDL program The Approach A spectrum of interoperability

  20. Levels of interoperability Level Agreements Example Federation Strict use of standards AACR, MARC (syntax, semantic, Z 39.50 and business) Harvesting Digital libraries expose Open Archives metadata; simple metadata harvesting protocol and registry Gathering Digital libraries do not Web crawlers cooperate; services must and search engines seek out information

  21. Searching What to Index? When possible, full text indexing is excellent, but full text indexing is not possible for all materials (non-textual, no access for indexing). Comprehensive metadata is an alternative, but available for very few of the materials. What Architecture to Use? Few collections support an established search protocol (e.g., Z39.50)

  22. Function versus cost of acceptance Cost of acceptance Z39.50 SDLIP Metadata Harvesting Function

  23. Z39.50 principles • Servers store a set of databases with searchable indexes • Interactions are based on a session • The client opens a connection with the server(s), carries out a sequence of interactions and then closes the connection. • During the course of the session, both the server and the client remember the state of their interaction.

  24. State • Z39.50 • The server carries out the search and builds a results set • Server saves the results set. • Subsequent message from the client can reference the result set. • Thus the client can modify a large set by increasingly precise requests, or can request a presentation of any record in the set, without searching entire database.

  25. Broadcast Searching does not Scale Collections User interface server User

  26. Open Archives Initiative Protocol for Metadata Harvesting • Low-barrier protocol for exposing structured information (metadata) from cooperating repositories • Provides opportunity for building comprehensive service network • http://www.openarchives.org

  27. Metadata harvesting OAI-PMH: A simple two party model for sharing structured information Service Providers Discovery Current Awareness Preservation Data Providers

  28. Resource discovery over distributed collections metadata Author Title Abstract Identifer

  29. OAI-PMH Key technical features • Deploy now technology – 80/20 rule • Simple HTTP encoding • Foundation of established XML standards • Multiple metadata formats • Repository partitioning (sets) • Selective harvesting (sets and dates) • Clean partition between core and implementation-specific extensions • Multiple item-level metadata • Collection level metadata

  30. OAI Verbs • Identify – repository characteristics • ListMetadataFormats – DC required • ListSets – repository paritioning • ListRecords – (selectively) harvest metadata • ListIdentifiers – (selectively) harvest metadata identifiers • GetRecord – known item retrieval

  31. The Metadata Repository Services The metadata repository is a resource for service providers. It holds information about every collection and item known to the NSDL. Users Metadata repository Collections

  32. Metadata Repository • Central storage of all metadata about all resources in the NSDL • Defines the extent of NSDL collection • Metadata includes collections, items, annotations, etc. • MR main functions • Aggregation • Normalization • redistribution • Ingest of metadata by various means • Harvesting, manual, automatic, cross-walking • Open access to MR contents for service builders via OAI-PMH

  33. Cleanup and crosswalks Harvest Database load Metadata Repository Staging area Collections Importing metadata into the MR

  34. Exporting metadata from the MR

  35. Search Architecture Metadata repository Portal OAI SDLIP Search andDiscoveryServices Portal http Portal Collections James Allan,Bruce Croft (University of Massachusetts, Amherst)

  36. The Metadata Repository as a Resource Support for Service Providers Records are exposed through Open Archives Initiative harvesting protocol. Core Integration team will provide some services based on the metadata repository. The architecture encourages others to build services.

  37. Building on the basics • Gathering resources from the open web • Automated collection aggregation • Automated metadata generation • Content of resource • Context of resource • Automated quality assessment • Annotation, review, and aggregation environment

  38. If you find this all interesting • CS502 – Architecture of Web information Systems

More Related