290 likes | 427 Views
6 th Annual State GILS Conference, March 31 – April 3, 2004, Raleigh, NC. Next Generation Z39.50 A Web Services Approach for Search and Retrieve.
E N D
6th Annual State GILS Conference, March 31 – April 3, 2004, Raleigh, NC Next Generation Z39.50A Web Services Approach for Search and Retrieve William E. Moen<wemoen@unt.edu>School of Library and Information SciencesTexas Center for Digital KnowledgeUniversity of North TexasDenton, TX 72603
Overview • Quick description of SRW • Brief background – historical, political, conceptual • Non-technical (almost) introduction to SRW • Common Query Language (CQL) briefly • Concluding thoughts 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
What is SRW? • Search and Retrieve Web Service (SRW) • An XML-based protocol for searching, retrieving, and other information retrieval transactions • Cast in the standards/technologies for web services • XML • SOAP • HTTP • Brings the concepts and experience of Z39.50 into the web environment using web technologies 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
Why SRW? • Genesis: several years of soul searching by Z39.50 developers and implementors • The “web” had become the common implementation environment • Z39.50 was not perceived as web friendly • Pivotal moments: • December 2000 ZIG meeting • July 2001 meeting 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
Turning point: December 2000 • “Z39.50 Future” discussion • Perceptions of Z39.50 • broken • heavy-weight • difficult and complex • old technology • not web friendly • Several options presented • Rewrite the protocol from the ground up • Rewrite as an XML protocol • Separate the Z39.50 protocol from its use of BER as a wire protocol • Simplify the protocol specifications to focus on core features • Recognition of the intellectual contribution of Z39.50 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
Taking action: June 2001 • Invitational meeting to discuss moving Z39.50 to an XML-based protocol • Goal • Lower the barriers to implementation while preserving the existing intellectual contributions of Z39.50, discarding those aspects no longer useful or meaningful. • Objective • Define specifications for a new web service definition based on Z39.50 together with web technologies • Separate the Z39.50 abstract and associated semantic model from its specific encoding and wire protocol (i.e., ASN.1/BER and TCP/IP) • Initially called Z39.50 Next Generation (ZNG) • Intended as proof-of-concept • Defining only those protocol specifications that would actually be implemented by participants 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
ZING – Z39.50 International Next Generation • Make intellectual/semantic content of Z39.50 more broadly available • Make Z39.50 more attractive by lowering barriers to implementation • Use of XML – to represent and encode data • Use of HTTP – for transport • Use of SOAP – for interaction between client and server based on Remote Procedural Call (RPC) • Several ZING initiatives: ZOOM, ez39.50, ZeeRex, SRW/U FOR MORE INFORMATION, VISIT THE ZING WEBSITE… http://www.loc.gov/z3950/agency/zing/ 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
SRW/U, SRW, SRU • SRW/U: Search and Retrieve for the Web • General designation for this initiative • SRW: Search and Retrieve Web Service • HTTP Post • Simple Object Access Protocol (SOAP) • XML messages • SRU: Search and Retrieve URL Service • HTPP Get • Request parameters included in URL syntax • Development • Version 1.0 November 2001 • Version 1.1 February 2002 FOR MORE INFORMATION, VISIT THE SRW WEBSITE… http://www.loc.gov/srw 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
Networked information retrieval • What’s needed: • Identifying a target to search • A vocabulary for expressing search requests, search criteria, retrieval requests, etc. • Methods to encode the requests and responses from the target • Methods to transport the requests and responses across a network • In other words, a protocol and supporting specifications 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
Abstract Model of IR 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
Abstract model of Z39.50 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
Z39.50 classic & SRW 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
SRW Overview • Builds on Z39.50 concepts and web technologies • Web technologies: XML, SOAP, HTTP • Uses new, human-readable query language • Combines several Z39.50 features into several “operation types” • searchRetrieve operation • scan operation • explain operation 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
searchRetrieve operation • The core of the protocol • Expresses the search and additional criteria • Records are returned in XML • Request parameters • version • query • Optional parameters • sortkeys • recordPacking • recordSchema • recordXPath • stylesheet • Response parameters • version • numberOfRecords • Optional parameters • resultSetID • resultSetIdleTime • records • diagnostics 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
SRW & XML • XML as foundation for protocol • Provides syntax for intelligent markup • Defines or references XML schemas • Example XML schema for SRW specifications • searchRetrieveRequest • searchRetrieveResponse 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
searchRetrieveRequest example • Sent as a HTTP Post • XML document is sent to the server • Using SOAP to wrap the request <searchRetrieveRequest> <version>1.1</version> <query>dc.title all "Squirrel Hungry"</query> <maximumRecords>1</maximumRecords> <startRecord>1</startrecord> <recordSchema>dc</recordSchema> </searchRetrieveRequest> 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
searchRetrieveResponse example <searchRetrieveResponse> <version>1.1</version> <numberOfRecords>10</numberOfRecords> <records> <record> <recordSchema>info:srw/schema/1/dc- v1.1</recordSchema> <recordData> <dc:record> <dc:title>Squirrel is Hungry</dc:title> </dc:record> </recordData> </record> </records> </searchRetrieveResponse> 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
searchRetrieve response • Records returned in response • All records in XML syntax • According to one or more XML schemas (semantics) • Dublin Core • Onix • MODS • MarcXml 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
searchRetrieve example <searchRetrieveRequest> <version>1.1</version> <query>dc.title computer</query> <startRecord>1</startrecord> <maximumRecords>10</maximumRecords> <recordPacking>xml</recordPacking> <recordSchema>dc></recordSchema> </searchRetrieveRequest> • Retrieval results • XML view • Screen shot 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
SRW results 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
SRU briefly • Protocol requests can be carried via HTTP Get • searchRetrieveRequest parameters expressed in standard URL syntax • baseURL and search part separated by question mark “?” • Response is XML document containing records • The searchRetrieveRequest in SRU: • http://alcme.oclc.org/srw/search/SOAR?operation=searchRetrieve&version=1.1&query=dc.title=%22computer%22&recordSchema=DC&startRecord=1&maximumRecords=10&recordPacking=xml 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
search/Retrieve query • SRW query consists of one or more query statements linked by Boolean operators • Five categories of query statements: • single search clause • two or more search clauses linked by Boolean • search clauses and result sets linked by Boolean • two or more result sets linked by Boolean • single result set • Expressed in the Common Query Language (CQL) 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
Common Query Language (CQL) • A formal language for representing queries to information retrieval systems • Human-readable • Search clause • Always includes a term • simple terms consist of one or more words • May include index name • To limit search to a particular field/element • Index name includes base name and may include prefix • title, subject • dc.title, dc.subject • Several index sets have been defined (called Context Sets in SRW) • dc • bath • srw • Context set defines the available indexes for a particular application 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
Other components of CQL • Relation • <, >, <=, >=, =, <> • exact used for string matching • allwhen term is list of words to indicate all words must be found • anywhen term is list of words to indicate any words must be found • Boolean operators: and, or, not • Proximity (prox operator) • relation (<, >, <=, >=, =, <>) • distance (integer) • unit (word, sentence, paragraph, element) • ordering (ordered or unordered) • Masking rules and special characters • single asterisk (*) to mask zero or more characters • single question mark (?) to mask a single character • carat/hat (^) to indicate anchoring, left or right 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
CQL examples • Simple queries: • dinosaur • "the complete dinosaur" • Boolean • dinosaur and bird or dinobird • "feathered dinosaur" and (yixian or jehol) • Proximity • foo prox bar • foo prox/>/4/word/ordered bar • Indexes • title = dinosaur • bath.title="the complete dinosaur" • srw.serverChoice=dinosaur • Relations • year > 1998 • title all "complete dinosaur" • title any "dinosaur bird reptile" • title exact "the complete dinosaur" 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
SRW No explicit concept of connection, session, or state Results sets named by server Single record syntax (XML), multiple schemas String (i.e., human-readable)queries CQL Named indexes Classic Z39.50 Stateful Results sets named by client Multiple record syntaxes No human-readable query language Type 1 query using attribute sets Use attribute to identify access point SRW & classic Z39.50 • Z39.50 Concepts Retained • Result sets • Abstract access points • Abstract record schemas • Explain • Diagnostics 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
What problems does SRW solve • Addresses need for standards-based searching in the networked environment • Shows the vitality of the Z39.50 concepts and implements those in a web services & URL access context • Offers database providers with a web-friendly method for offering standards-based searching of resources • Provides low barrier to entry solution using commonly available technologies • XML format of records provide for more reuse, and more interesting use of resources 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
Possible implementation venues • Gateways to existing Z39.50 servers • Lightweight SRW/U servers to specialized databases • Cost-effective search access to commercial databases (e.g., citation, full-text) • Metasearching • Beyond libraries to many other information communities 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC
References • Z39.50 International Next Generation – ZING • http://www.loc.gov/z3950/agency/zing/ • Search and Retrieve for the Web – SRW/U • http://www.loc.gov/srw • A Gentle Introduction to SRW • http://www.loc.gov/z3950/agency/zing/srw/introduction.html • A Gentle Introduction to CQL • http://zing.z3950.org/cql/intro.html • Search and Retrieval in The European Library: A New Approach by van Veen and Oldroyd in D-Lib (Feb04) • http://www.dlib.org/dlib/february04/vanveen/02vanveen.html 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC