420 likes | 495 Views
The Search for Quality: productive Web searching. John Cox James Hardiman Library NUI, Galway. The Problem. 7.3 million new Web pages daily Quality varies, mainly due to ease of publication and lack of checks Quality is in the eye of the beholder Over-dependence on general search engines
E N D
The Search for Quality: productive Web searching John Cox James Hardiman Library NUI, Galway
The Problem • 7.3 million new Web pages daily • Quality varies, mainly due to ease of publication and lack of checks • Quality is in the eye of the beholder • Over-dependence on general search engines • Simplistic use of search tools
Some Usage Findings • NUI, Galway Library survey, March 2000: • Search engines cited by 79 out of 167 respondents • Exclusively used for, eg Nazism, defamation law, hepatitis C • Less than 50% satisfied • Other surveys show very simplistic use: • 33% users enter one word only • Further 33% users enter two words only • UK survey indicates 80% searchers waste some time • US survey shows “search rage” within 12 minutes
Key Question • “How much better than users are information staff at finding high-quality information on the Web and what leadership do we provide?” • 5 key actions needed
5 Key Actions • Get the best from the search engines • Go vertical: subject-specific sources • Take time to experiment, eg helper software • Exploit the invisible Web • Actively promote quality searching
1: Get the Best from the Search Engines • Understand how they work • Know their limitations • Use advanced features • Search more than one • Know when not to use them
Search Engine Components • Crawler: follows links • Indexer: builds database • Query processor: lets us search
Common Limitations • Profit-oriented • Paid entries listed at top • Out of date • Partial site indexing • Technically must exclude many sites, eg • Password-protected • Registration needed • Database-driven • Hidden search facilities
Strengths Coverage Cached pages File types, eg PDF,.doc,.ppt Relevance: link popularity Beyond pages: images, newsgroups Weaknesses Poor Boolean support No truncation Limited date searching Invisible search facilities Two pages per site displayed by default Understanding Google
Google: search modes Basic Advanced
Google: Boolean limitations 1 Correct syntax: medline OR embase
Google: Boolean limitations 2 Correct syntax: medline –embase (oruse Advanced Search)
Google: no truncation Use clinton (tax OR taxes OR taxation)
Google: hidden features 1 Discovered at www.searchengineshowdown.com (buried in Google help)
Google: hidden features 2 Partial URL v Specific Site Search: Not possible on Advanced Search despite “Domains” limit
Other Search Engines • Always worth searching more than one, eg • All the Web (FAST) • AltaVista • Lycos/HotBot • Northern Light (?) • Overlap may be limited • Different ranking criteria
3. Experimentation • Try out “add-on” search software, eg • BullsEye Pro • Copernic • Copernic Summariser
4: Explore the “Invisible Web” • Material, often of high quality, that general search engines can’t or won’t index • Unlinked pages • Non-HTML file types, eg audio, video, PDF • Authenticated sites • Databases • Much greater in size than visible Web
5. Promote Quality Searching • Old sources • Old habits • New media
Old Habits Concept analysis Search strategy formulation Critical source selection Patience Flexibility Critical appraisal of search hits
New Media Library Web Site E-newsletter http://www.hw.ac.uk/libWWW/irn/irn.html Weblog
Towards a Brighter Future • Automatically-generated, accurate metadata • Smarter search engines • More quality-sensitive • More penetrative • XML: structured data
References • Sherman, Chris and Price, Gary The invisible Web: uncovering information sources search engines can't see. Medford, N.J.: Information Today, 2001. ISBN 091096551X. (accompanying database at http://invisible-web.net) • Search Engine Watch: http://www.searchenginewatch.com • Search Engine Showdown: www.searchengineshowdown.com