310 likes | 446 Views
TLA NetFair, April 7, 2005, Austin, TX. A Web Services Approach for Search and Retrieve The Next Generation Z39.50. William E. Moen <wemoen@unt.edu> School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton, TX 72603. Overview.
E N D
TLA NetFair, April 7, 2005, Austin, TX A Web Services Approach for Search and Retrieve The Next Generation Z39.50 William E. Moen<wemoen@unt.edu>School of Library and Information SciencesTexas Center for Digital KnowledgeUniversity of North TexasDenton, TX 72603
Overview • Quick description of SRW • Brief background – historical, political, conceptual • Non-technical (almost) introduction to SRW • Common Query Language (CQL) briefly • Concluding thoughts NetFair -- Texas Library Association -- April 2005
What is SRW/U? • An XML-based protocol for searching, retrieving, and other information retrieval transactions • Cast in the standards/technologies for web services • XML • SOAP • HTTP (Post and Get) • Brings the concepts and experience of Z39.50 into the web environment using web technologies NetFair -- Texas Library Association -- April 2005
Why SRW/U? • Genesis: several years of soul searching by Z39.50 developers and implementors • The “web” had become the common implementation environment • Z39.50 was not perceived as web friendly • What was needed: • Simpler • More comprehensible • More easily implemented • Web compatible • Retain the intellectual contribution of Z39.50 NetFair -- Texas Library Association -- April 2005
Taking action: June 2001 • Invitational meeting to discuss moving Z39.50 to an XML-based protocol • Goal • Lower the barriers to implementation while preserving the existing intellectual contributions of Z39.50, discarding those aspects no longer useful or meaningful. • Objective • Define specifications for a new web service definition based on Z39.50 together with web technologies • Separate the Z39.50 abstract and associated semantic model from its specific encoding and wire protocol (i.e., ASN.1/BER and TCP/IP) • Initially called Z39.50 Next Generation (ZNG) • Intended as proof-of-concept • Defining only those protocol specifications that would actually be implemented by participants NetFair -- Texas Library Association -- April 2005
ZING – Z39.50 International Next Generation • Make intellectual/semantic content of Z39.50 more broadly available • Several ZING initiatives: ZOOM, ez39.50, ZeeRex, SRW/U • Make Z39.50 more attractive by lowering barriers to implementation • Use of XML – to represent and encode data • Use of HTTP – for transport • Use of SOAP – for interaction between client and server based on Remote Procedural Call (RPC) FOR MORE INFORMATION, VISIT THE ZING WEBSITE… http://www.loc.gov/z3950/agency/zing/ NetFair -- Texas Library Association -- April 2005
SRW/U, SRW, SRU • SRW/U: Search and Retrieve for the Web • General designation for this initiative • SRW: Search and Retrieve Web Service • HTTP Post • Simple Object Access Protocol (SOAP) • XML messages • SRU: Search and Retrieve URL Service • HTPP Get • Request parameters included in URL syntax • Development • Version 1.0 November 2001 • Version 1.1 February 2002 • Registered with NISO in Fall 2004 FOR MORE INFORMATION, VISIT THE SRW WEBSITE… http://www.loc.gov/srw NetFair -- Texas Library Association -- April 2005
Networked information retrieval • What’s needed: • Identifying a target to search • A vocabulary for expressing search requests, search criteria, retrieval requests, etc. • Methods to encode the requests and responses from the target • Methods to transport the requests and responses across a network • In other words, a protocol and supporting specifications NetFair -- Texas Library Association -- April 2005
Abstract Model of IR NetFair -- Texas Library Association -- April 2005
Abstract model of Z39.50 NetFair -- Texas Library Association -- April 2005
Z39.50 classic & SRW NetFair -- Texas Library Association -- April 2005
SRW Overview • Builds on Z39.50 concepts and web technologies • Web technologies: XML, SOAP, HTTP • Uses new, human-readable query language • Combines several Z39.50 features into several “operation types” • searchRetrieve operation • scan operation • explain operation NetFair -- Texas Library Association -- April 2005
searchRetrieve operation • The core of the protocol • Expresses the search and additional criteria • Records are returned in XML • Request parameters • version • query • Optional parameters • sortkeys • recordPacking • recordSchema • recordXPath • stylesheet • Response parameters • version • numberOfRecords • Optional parameters • resultSetID • resultSetIdleTime • records • diagnostics NetFair -- Texas Library Association -- April 2005
SRW & XML • XML as foundation for protocol • Provides syntax for intelligent markup • Defines or references XML schemas • searchRetrieveRequest • searchRetrieveResponse NetFair -- Texas Library Association -- April 2005
searchRetrieveRequest example • Sent as a HTTP Post • XML document is sent to the server • Using SOAP to wrap the request <searchRetrieveRequest> <version>1.1</version> <query>dc.title all "Squirrel Hungry"</query> <maximumRecords>1</maximumRecords> <startRecord>1</startrecord> <recordSchema>dc</recordSchema> </searchRetrieveRequest> NetFair -- Texas Library Association -- April 2005
searchRetrieveResponse example <searchRetrieveResponse> <version>1.1</version> <numberOfRecords>10</numberOfRecords> <records> <record> <recordSchema>info:srw/schema/1/dc- v1.1</recordSchema> <recordData> <dc:record> <dc:title>Squirrel is Hungry</dc:title> </dc:record> </recordData> </record> </records> </searchRetrieveResponse> NetFair -- Texas Library Association -- April 2005
searchRetrieve response • Records returned in response • All records in XML syntax • According to one or more XML schemas (semantics) • Dublin Core • Onix • MODS • MarcXml NetFair -- Texas Library Association -- April 2005
searchRetrieve example <searchRetrieveRequest> <version>1.1</version> <query>dc.title computer</query> <startRecord>1</startrecord> <maximumRecords>10</maximumRecords> <recordPacking>xml</recordPacking> <recordSchema>dc></recordSchema> </searchRetrieveRequest> • Retrieval results • XML view • Screen shot with stylesheet applied NetFair -- Texas Library Association -- April 2005
SRW results NetFair -- Texas Library Association -- April 2005
S O A P S O A P SRW Model S R W S R W DB App HTTP POST Client DB Server side NetFair -- Texas Library Association -- April 2005
SRU briefly • Protocol requests can be carried via HTTP Get • searchRetrieveRequest parameters expressed in standard URL syntax • baseURL and search part separated by question mark “?” • Response is XML document containing records • A searchRetrieveRequest in SRU: • http://www.loc.gov/z39voy?operation=searchRetrieve&version=1.1&query=texas&recordSchema=mods&startRecord=1&maximumRecords=1 NetFair -- Texas Library Association -- April 2005
S R U SRU Model http://www.loc.gov/z39voy?operation=searchRetrieve&version=1.1&query=texas&recordSchema=mods&startRecord=1&maximumRecords=1 W e b s e r v e r Client S R W DB App HTTP GET DB Example Server side NetFair -- Texas Library Association -- April 2005
search/Retrieve query • SRW query consists of one or more query statements linked by Boolean operators • Five categories of query statements: • single search clause • two or more search clauses linked by Boolean • search clauses and result sets linked by Boolean • two or more result sets linked by Boolean • single result set • Expressed in the Common Query Language (CQL) NetFair -- Texas Library Association -- April 2005
Common Query Language (CQL) • A formal language for representing queries to information retrieval systems • Human-readable • Search clause • Always includes a term • simple terms consist of one or more words • May include index name • To limit search to a particular field/element • Index name includes base name and may include prefix • title, subject • dc.title, dc.subject • Several index sets have been defined (called Context Sets in SRW) • dc • bath • srw • Context set defines the available indexes for a particular application NetFair -- Texas Library Association -- April 2005
Other components of CQL • Relation • <, >, <=, >=, =, <> • exact used for string matching • allwhen term is list of words to indicate all words must be found • anywhen term is list of words to indicate any words must be found • Boolean operators: and, or, not • Proximity (prox operator) • relation (<, >, <=, >=, =, <>) • distance (integer) • unit (word, sentence, paragraph, element) • ordering (ordered or unordered) • Masking rules and special characters • single asterisk (*) to mask zero or more characters • single question mark (?) to mask a single character • carat/hat (^) to indicate anchoring, left or right NetFair -- Texas Library Association -- April 2005
CQL examples • Simple queries: • dinosaur • "the complete dinosaur" • Boolean • dinosaur and bird or dinobird • "feathered dinosaur" and (yixian or jehol) • Proximity • foo prox bar • foo prox/>/4/word/ordered bar • Indexes • title = dinosaur • bath.title="the complete dinosaur" • srw.serverChoice=dinosaur • Relations • year > 1998 • title all "complete dinosaur" • title any "dinosaur bird reptile" • title exact "the complete dinosaur" NetFair -- Texas Library Association -- April 2005
SRW/U No explicit concept of connection, session, or state Results sets named by server Single record syntax (XML), multiple schemas String (i.e., human-readable)queries CQL Named indexes Classic Z39.50 Stateful Results sets named by client Multiple record syntaxes No human-readable query language Type 1 query using attribute sets Use attribute to identify access point SRW/U & classic Z39.50 • Z39.50 Concepts Retained • Result sets • Abstract access points • Abstract record schemas • Explain • Diagnostics NetFair -- Texas Library Association -- April 2005
What problems does SRW solve • Addresses need for standards-based searching in the networked environment • Shows the vitality of the Z39.50 concepts and implements those in a web services & URL access context • Offers database providers with a web-friendly method for offering standards-based searching of resources • Provides low barrier to entry solution using commonly available technologies • XML format of records provide for more reuse, and more interesting use of resources NetFair -- Texas Library Association -- April 2005
Possible implementation venues • Gateways to existing Z39.50 servers • Lightweight SRW/U servers to specialized databases • Cost-effective search access to commercial databases (e.g., citation, full-text) • Metasearching • Beyond libraries to many other information communities NetFair -- Texas Library Association -- April 2005
SRU at The European Library (TEL) Graphic from : van Veen, T. & Oldroyd, B. (2004, February) Search and retrieval in The European Library. D-Lib Magazine 10(2). Retrieved February 24, 2005 from D-Lib Magazine website: http://www.dlib.org/dlib/february04/vanveen/02vanveen.html NetFair -- Texas Library Association -- April 2005
References • Z39.50 International Next Generation – ZING • http://www.loc.gov/z3950/agency/zing/ • Search and Retrieve for the Web – SRW/U • http://www.loc.gov/srw • A Gentle Introduction to SRW • http://www.loc.gov/z3950/agency/zing/srw/introduction.html • A Gentle Introduction to CQL • http://zing.z3950.org/cql/intro.html • Search and Retrieval in The European Library: A New Approach by van Veen and Oldroyd in D-Lib (Feb04) • http://www.dlib.org/dlib/february04/vanveen/02vanveen.html NetFair -- Texas Library Association -- April 2005