170 likes | 353 Views
Federated Searching: The ABC’s of HSE, XML, & Z39.50. Harry Samuels Product Manager Linking & Searching August 27, 2004. Topics. The Challenge of Federated Searching Z39.50 XML Gateways HTTP Searching So, Where Are We Now? The Future SRW/SRU NISO Metasearch Initiative
E N D
Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry SamuelsProduct Manager Linking & Searching August 27, 2004
Topics • The Challenge of Federated Searching • Z39.50 • XML Gateways • HTTP Searching • So, Where Are We Now? • The Future • SRW/SRU • NISO Metasearch Initiative • The Generic XML Gateway API
The Challenge of Federated Searching • To execute federated searching, one needs a protocol or mechanism to search each of the electronic resources one would like to search • But one protocol does not fit all in the federated search environment - different electronic resources require different mechanisms • The challenge is to figure out how an electronic resource can be searched and have the right mechanism in place for each situation
Z39.50 • The protocol we love to hate • Z39.50 is the oldest of the commonly used search mechanisms • Almost every integrated library system can be searched using Z39.50 • Despite the issues with Z39.50 it provides a fairly dependable mechanism for searching
Z39.50 • The main problem with Z39.50 is that very few content providers implemented Z39.50 • But it is the content of the commercial providers that we really want to search from our federated search systems
XML Gateways • Enter the XML gateway • But first of all, what does XML gateway mean? • As in Z39.50, there must be an XML gateway client that transmits search queries and accepts results – This is the part of the XML gateway that is in the federated search system • There must also be an XML gateway server that responds to search queries – This is the part of the XML gateway that is at the content provider site
XML Gateways • An XML gateway client sends a search query over http • The query is (1) packeded into the query string of a URL or (2) packaged into an XML document that is posted to the resource • Regardless of how the query is packaged the results are sent back in an XML document over http • The use of XML in at least one of the steps gave rise to the name XML Gateway
XML Gateways • XML gateways provide an alternative mechanism for searching an electronic resource • Every XML gateway is different and every XML gateway requires special programming or special configuration • As electronic resource providers implement search mechanisms they are implementing XML gateways and not Z39.50 servers • XML gateways are the future – the world of electronic resources and federated searching just needs to catch up with the future
HTTP Searching • Z39.50 was implemented by very few content providers and XML gateways are just now catching on – so how do we search everything else • The same way a user does… • The federated search system pretends to be a user sitting at a web browser – it simulates the actions of a human user by generating URL’s that are understood by the electronic resource – and then extracting the information off of the web pages that are returned
HTTP Searching • This is possible because almost all electronic resources are accessed over the web • At Endeavor, we simply call the HTTP Search Engine the HSE • It is capable of searching hundreds of web sites and databases that are inaccessible via Z39.50 or XML gateways • Some federated search engines use HTTP searching as the preferred search mechanism
HTTP Searching • Despite its reach, there are issues with HTTP searching • It usually cannot retrieve a large set of metadata in its results sets • If the user interface of an electronic resource changes then the HSE connector for that resource usually breaks – this means that HTTP searching is fragile and requires constant maintenance
So Where Are We Now? • Adoption of Z39.50 has stalled • XML gateway adoption is in the early stages and many content providers do not yet have them • HTTP searching can search far more resources than Z39.50 or XML gateways, but it is fragile and usually does not retrieve a robust set of metadata
The Future • SRW/SRU • NISO Metasearch Initiative • The Generic XML Gateway API
SRW/SRU • The next generation of Z39.50 over the web • “Search and Retrieve Web Service (SRW) and Search and Retrieve URL Service (SRU) are Web Services-based protocols for querying databases and returning search results.” Eric Lease Morgan • http://www.loc.gov/z3950/agency/zing/srw/ • It is a version of an XML gateway that holds the promise of a standard XML Gateway protocol
NISO Metasearch Initiative “NISO's metasearch Initiative will identify, develop, and frame the standards and other common understandings that are needed to enable an efficient and robust information environment. The goal of NISO's Metasearch Initiative is to enable: • metasearch service providers to offer more effective and responsive services • content providers to deliver enhanced content and protect their intellectual property • libraries to deliver services that distinguish their services from Google and other free web services. “ • http://www.niso.org/committees/MS_initiative.html
The Generic XML Gateway API • We couldn’t wait… • ENCompass already had an XML gateway search infrastructure • From that infrastructure, we created a generic gateway and documented it • It is freely available to Endeavor customers • When content providers ask us “how to build an XML gateway” we share the specification with them