200 likes | 311 Views
Metasearch Technologies: Definitions, Issues, Reference Applications. William H. Mischo & Mary C. Schlembach w-mischo@uiuc.edu schlemba@uiuc.edu Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign Session 2441: Federated Searching
E N D
Metasearch Technologies: Definitions, Issues, Reference Applications William H. Mischo & Mary C. Schlembach w-mischo@uiuc.eduschlemba@uiuc.edu Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign Session 2441: Federated Searching ASEE 2004 National Conference June 22, 2004
Outline • Distributed, heterogeneous repositories require federation and linking. • Metasearch definitions. • Technologies. • UIUC RFP process. • Issues and trends. • Expanding use of metasearch. • Custom reference and search applications.
Distributed Information Environment • We live in a world of multiple, heterogeneous information resources. • OPACs: local, regional, national shared. • Locally mounted and remote A & I Services. • Discrete publisher and vendor full-text repositories. • E-Resource registries: Serial solutions, TDNet. • OAI search services (OAIster, NSDL) and preprint servers. • Web search engines. • Vertical publisher and vendor portals (ARL Portal, DOE Information Bridge, Elsevier Scirus & Scopus, EI Village, BioMed Central, Public Library of Science). Surface Web and Hidden Web. • Institutional Repositories (D-Space). • Instructional (course) management systems (WebCT, Blackboard). • David Seaman: ‘we don’t shelve by publisher, why do we expect users to search by publisher.’
Metasearch as a Solution • Distributed, heterogeneous resources and repositories require federation and linking. • Terminology: Metasearch, parallel search, federated search, broadcast search, cross-database search, simultaneous search, search portal. • Defined by allowing search and retrieval to span multiple databases, sources, platforms, protocols, and vendors at once. • NISO Metasearch Initiative: emerging standards and best practices.
Value of Metasearch (Pro) • To recommend and rank specific information resources for users; to facilitate search over multiple information resources. • “Metasearching and other means of unifying search across heterogeneous products…most significant trend.” • Deployment of algorithmic searches that mimic the behavior of reference librarian. • Integration of E-resources, Local Link Resolvers, and Metasearch.
Value of Metasearch (caveats) • Metasearch vendor: “metasearch does not provide the robust search functionality of native interfaces.” Thesauri browse, e-mailing large (non-displayed) sets. • LJ editorial 4/1/04, “Do we want or need metasearching?” Content, Limiting, full-text links, Used Education, Boolean, Thesauri. • NISO Workshop: Local Link resolution a mission-critical application; metasearch not yet.
Nomenclature • Metasearch also used to refer to systems that search a multiple number of previously crawled Web search engines, such as Google, AlltheWeb, AltaVista. • Examples: EZ2Find, Vivisimo, Dogpile, Kartoo. • In our world, refers to systems that work over the distributed information environment predominated by bibliographic resources.
Federated vs. Broadcast Searching • Federated: heterogeneous information resources are imported or “harvested” (sometimes using OAI-PMH protocols) into a local, central site and the normalized results are placed into a homogeneous database system for search and discovery. • EI Village, ISI Web of Knowledge, OAISTER, Grainger OAI Search Service, NSDL.
Federated vs. Broadcast Searching • Broadcast: user search arguments are sent asynchronously (all at the same time) to remote, distributed systems and the search results are collected, normalized, and displayed to the user. • MetaLib, EnCompass for Resource Access, WebFeat. • Not mutually exclusive. Can do broadcast over federated systems.
Broadcast Search Basic Technologies • Z39.50 • HTTP “screen-scraping” • XML gateway and Web Services. • Proprietary APIs.
MetaSearch Implementations • Ex Libris MetaLib. • Endeavor EnCompass. • Innovative Interfaces MetaFind. • MuseGlobal MuseSearch • EI Village; ISI Web of Knowledge. • WebFeat. • California Digital Library SearchLight system. • Fretwell-Downing. • Locals (NCSU, Grainger Library, Los Alamos).
Retrieval Issues • Pass-through to native interface at point of search departure. • Coupling of metasearch records with Local Link Resolvers. • Providing OpenURL enabled links to full-text, other services. • Merging and De-Duplication. • Partial de-duping of sequentially retrieved sets. • Pulling over already extant full-text links from vendor systems.
Technology Issues • Consortium-based implementations. • Search Statistics (COUNTER compliance). • Vendor concerns with supporting multiple metasearch sessions – throwing a logoff to kill a session. • Search query standards – SRW/SRU, XQuery, OpenURL, one-step URL-launch searches.
Future and Custom Applications • Time of rapid development and growth in Metasearch applications. Expect continuing evolution. • Metasearch technology fairly easy to implement locally over selected resources. • Focusing on apps that allow custom Best-Match and algorithmic searching that mimics reference librarian.
Our Approach • User interface and discovery systems that emphasize function or needs-based approaches to retrieval. Reference and Known-item. • Metasearch technologies that offer additional opportunities beyond simultaneous search of discrete A & I Services. • Performing multiple searches within individual resources to determine “Best-Match” search results. Combined with selected simultaneous search of other resources.
UIUC Examples • Conference (Paper) Search: • Multiple searches within OPAC for held conference proceeding + EI Village for specific paper and OCLC Conference Papers. Failed conference search presents similar journal articles. • Journal Finder • searching e-resource registry (based on TDNet), local serial databases, two different OPAC searches for holdings. Searches CrossRef for DOI full-text link. • Used in training reference staff and assisting in patron point-of-need services.
Features • Performing multiple searches within a specific resource in order to arrive at the optimum result set. • Interpret the user-entered search argument and then route the query to selected resources: ACM, IEEE. • Takes user-entered title search string and checks against an abbreviation database at the title and word level. Stop words in OPAC. • Search results presented as they are returned or having the aggregate results interpreted and presented with accompanying explanations.