270 likes | 361 Views
Things You Just Have to Know About Search Engines. Ran Hock Online Strategies May 14, 2002 InfoToday 2002. Things You Just Have to Know About Search Engines. 1 - No Search Engine Covers Everything
E N D
Things You Just Have to Know About Search Engines Ran Hock Online Strategies May 14, 2002 InfoToday 2002
Things You Just Have to Know About Search Engines • 1 - No Search Engine Covers Everything • 2 - Different Engines "Miss" and Find Different Things • 3 - Large Numbers Aren’t Necessarily Bad Searches • 4 - All Search Engines Have Techniques That Allow You Improve Results
Things You Just Have to Know About Search Engines • 5 - Metasearch engines are not "search engines" • 6 - Google is great, but not the only one you should use. • 7 - Some Things Change, Some Don't
1 -No Search Engine Covers Everything • There are pages no engine covers: Invisible pages • Un-linked pages, database pages, password protected sites, “deep” pages, etc. • Different engines ”miss" and find different things (Point #2)
2 - Different Engines Find and Miss Different Things • Each engine may find something others missed. • Even “2nd tier” engines find things missed by the top 3 • Consider the results of the following search on: “erris head” sailing
2 - Different Engines Find and Miss Different Things • Of the 20 different records retrieved by all the engines, Google found (only) 14 (70%) • Google missed 6 (30%) • If you had searched Google, then just one more engine, your retrieval would have increased by 15% • Even HotBot found 2 the other three engines missed.
2 - Different Engines Find and Miss Different Things - Why ? • Indexing "policies" • What words and other items get indexed • How those things are "parsed" • Crawling differences • Starting points • Depth / Breadth of crawling etc. • Spam policies • Ranking
3 - Large Numbers Aren’t Necessarily Bad Searches • Most common complaint • You’re not “obligated” • All use some form of relevance ranking • Relevance ranking does, to some degree at least, the same things we do to find the best items • What relevance ranking uses:
3 - Large Numbers Aren’t Necessarily Bad Searches Relevance ranking uses some combination of: • Popularity • Frequency of terms • Weighting by field (e.g., Title counts more than Summary) • Proximity of terms • Weighting by size of the type • Weighting according to the order in which the searcher entered terms • Etc.
3 - Large Numbers Aren’t Necessarily Bad Searches Most search engines automatically “enhance” your search • Automatic phrase identification • Word variants (and/or truncation) • Case sensitivity • Analysis of documents in the database (links, term association, associative networks, cluster analysis, co-occurrence, etc.) • Etc.
4- All Search Engines Provide Options for You to Enhance Your Search • Field Searching • title • URL • date • language • etc. • Boolean (yes, “Boolean,” which is neither difficult nor bad)
4- All Search Engines Provide Options for You to Enhance Your Search How do you know about these options • Use the Advanced Search page • Read the documentation • ________________
4- All Search Engines Provide Options for You to Enhance Your Search • Use the Advanced Search page
5 - Metasearch engines are not “search engines” • Consider the following example of a search done in individual engines, then in metasearch engines
5 - Metasearch engines arenot “search engines” • Most don’t search all of the largest engines • Most don’t give you more than 10 or 20 records from each engine • Most don’t convey your full query syntax to the target engines • Most give “paid sites” first • “Client-side” metasearch programs, e.g., Copernic and Bulls-Eye do NOT have the above problems. • Even online metasearch engines have occasional socially redeeming features (vivisimo’s clustering).
6 - Google is Great, But Not the Only One You Should Use • Points 1 and 2 - No search engine finds everything and different engines find different things
6 - Google is Great, But Not the Only One You Should Use Great Because of: • Size • Popularity-based ranking • Unique content • newsgroups • PDFs and other file types • largest image collection • Dandy little features like addresses, definitions, etc. • Pretty good search options
6 - Google is Great, But Not the Only One You Should Use But Doesn’t Have: • Everything • Truncation and NEAR that AltaVista has • As much news coverage as AllTheWeb • As much currentness as AllTheWeb (maybe) • Etc.
7 - Search Engines Change • In some ways a lot, in other ways very little
7 - Search Engines Change Areas of little change • For most engines: How they do basic things such as phrases, Boolean, truncation, field searching etc.
7 - Search Engines Change Areas of frequent/considerable change • Some come, some go Gone” Go/InfoSeek et al. Arrived: WiseNut, Teoma • How things are arranged on the home page (esp. AltaVista) • Partners (which directory they use, featured partners and tools, etc.) • Added content, esp, content types (PDFs, newsgroups, etc. in Google.)
In Summary • 1 - No Search Engine Covers Everything • 2 - Different Engines "Miss" and Find Different Things • 3 - Large Numbers Aren’t Necessarily Bad Searches • 4 - All Search Engines Have Techniques That Allow You Improve Results • 5 - Metasearch engines are not "search engines" • 6 - Google is great, but not the only one you should use. • 7 - Some Things Change, Some Don't
Ran Hock Online Strategies 1-800-871-4033 www.onstrat.com ran@onstrat.com