1 / 35

Lecture 14 Interoperability for information discovery

Lecture 14 Interoperability for information discovery. CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell.edu. Acknowledgements: Carl Lagoze. Why interoperability?.

aileen
Download Presentation

Lecture 14 Interoperability for information discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 14 Interoperability for information discovery CS 502 Computing Methods for Digital LibrariesCornell University – Computer ScienceHerbert Van de Sompelherbertv@cs.cornell.edu Acknowledgements: Carl Lagoze

  2. Why interoperability? The distributed information environment creates enormous challenges regarding the provision of coherent services. Addressing these challenges requires some form of interoperability.

  3. OPAC FTXT FTXT e-print A&I A&I Information resources: distributed distributed herbert van de sompel

  4. OPAC FTXT FTXT e-print A&I A&I Information resources: different authorities, technologies range of authorities, technologies herbert van de sompel

  5. OPAC FTXT FTXT e-print A&I A&I Information resources: challenges re coherent services ¡¡challenges re integrated access !! herbert van de sompel

  6. Interoperability helps! • UNICODE • XML • XML Schema, XML Namespaces • HTTP • Metadata formats (MARC, Dublin Core, …) • This lecture: interoperability for resource discovery (searching) • Next week: interoperability for reference linking

  7. Interoperability: technical & organizational • reaching interoperability is an organizational challenge (not only technical) • the challenge is to create incentives for independent digital libraries to adopt specifications

  8. Functionality versus cost of acceptance This picture does not show the eventual benefit of a protocol in the creation of coherent services Cost of acceptance Z39.50 SDLIP Metadata Harvesting OAI Functionality

  9. A&I image FTXT OPAC e-print federated searching

  10. Z39.50 http://www.loc.gov/z3950/agency/

  11. Aims of z39.50 • Permits one computer (z3950 client) to search and retrieve information on another computer serving a database (z3950 target) • Important both technically and for its wide use in library systems, A&I databases • Most development has concentrated on bibliographic data • Most implementations emphasize searches that use a bibliographic set of attributes to search databases of MARC records

  12. Technical history of z39.50 • Developed for X.25 networks (connection orientation), conversion to run over TCP fitted later • Original concept in days when repeating a search was expensive computation (about 1980) • NISO standard

  13. z39.50 - principles • Abstract view of database searching. • Server stores a set of databases with searchable indexes • Interactions are based on a session • The client opens a connection with the server, carries out a sequence of interactions and then closes the connection. • During the course of the session, both the server and the client remember the state of their interaction.

  14. z39.50 - state • The server carries out the search and builds a results set • Server saves the results set. • Subsequent message from the client can reference the result set. • Thus the client can modify a large set by increasingly precise requests, or can request a presentation of any record in the set, without searching entire database.

  15. z39.50 - services • init -- client connects to the server and exchanges initial information, e.g., preferred message size • explain -- client inquires of the server what databases are available for searching, the fields that are available, the syntax and formats supported, etc. • search -- client presents a query to a database; choices of syntax for specifying searches • • Boolean queries widely implemented manipulation of results sets -- e.g., sort or delete present -- requests the server to send specified records from the results set to the client in a specified format scan – browse indexes

  16. z39.50 – sample query • In the database named Books find all records for which the access pointtitle contains the value evangeline and the access pointauthor contains the value longfellow. • Z39.50 defines a rich variety of search access points (use attributes) that can be extended by implementers http://lcweb.loc.gov/z3950/agency/defns/bib1.html

  17. z39.50 – issues • No general acceptance (for instance not supported by web browsers => httpd/z3950 gateways) • Merging results for distributed search • Semantic interoperability ~ z39.50 profiles

  18. Simple Digital Library Interoperability Protocol http://www-diglib.stanford.edu/~testbed/doc2/SDLIP/

  19. SDLIP • Compromise between a full-scale, all encompassing approach such as Z39.50 and the “anything goes” approach typical for ad-hoc search interface design on web • Developed jointly by Stanford, Berkeley, and UC Santa Barbara • Heavily influenced by DASL from IETF

  20. SDLIP – search middleware

  21. SDLIP – managing complexity via separating interfaces

  22. SDLIP - interfaces • Search Interface – defines simple query language, protocol can then include other languages • Result Interface – parking meter metaphor (server decides on length of session) • Source Metadata Interface – allows to query a library proxy about its capabilities (subcollections, attributes that can be searched, …)

  23. Open Archives Initiative Metadata Harvesting Protocol http://www.openarchives.org

  24. OAMH protocol • Low-barrier framework for repository interoperability • Minimal burden for parties providing databases • Plug-in concept to allow community and service specialization (placeholders – about, descriptor ; parallel metadata formats; …)

  25. A&I image OPAC e-print harvester FTXT OAMH - metadata harvesting metadata FTXT

  26. A&I image FTXT e-print OPAC Author Title Abstract Identifer OAMH – federated services metadata

  27. Requests repos i tory harves ter Replies OAMH – protocol service provider data provider 6

  28. Reply • XML Schema • Self contained OAMH – core concepts • low-barrier interoperability • data-provider & service-provider model • metadata harvesting model OAMH protocol HTTP based • shared metadata format and parallel, community-specific metadata formats Dublin Core • authentication etc. : on purpose outside of the protocol

  29. repos i tory harves ter OAMH – protocol tools service provider data provider Datestamp Identifier Set Records

  30. repos i tory harves ter OAMH – protocol tools service provider data provider • Supporting protocol requests: • Identify • ListMetadataFormats • ListSets • Harvesting protocol requests: • ListRecords • ListIdentifiers • GetRecord

  31. repos i tory harves ter OAMH – a supporting protocol request service provider data provider ListMetadataFormats • ListMetadataFormats / Time / Request • REPEAT • Format prefix • Format XML schema • /REPEAT

  32. repos i tory harves ter OAMH – a harvesting protocol request service provider data provider * from=a * until=b * set=klm ListRecords * metadataPrefix=dc • ListRecords / Time / Request • REPEAT • Identifier • Datestamp • Metadata • /REPEAT

  33. OAMH – applications? • federated services [S&R, SDI, alerting, linking, ...] • database synchronization • harvesting the deep Web • ...

  34. OAMH – issues • sync between data provider and service provider (cf. Web page harvesters) • everyone harvesting from everyone? (cf. brokers) • CAN attractive cross-community services be built? • gaining critical mass

  35. Some thoughts • There is (and will never be) one right solution (technical vs. cost vs. complexity vs. ??) (cf metadata) • Distributed technical solutions have organizational ramifications • A revival of the simplicity approach: specifications that require little effort by the digital libraries involved, but that can eventually lead to appealing services (when combined with brute force computing and intelligent tools)

More Related