320 likes | 418 Views
Web Search Engines. by Greg R. Notess notess@imt.net imt.net/~notess/search. Overview:. Comparing the database content Change Comparative Size Overlap Looking towards future developments Portal or Destination Output sorting. Results are limited by. Database content
E N D
Web Search Engines by Greg R. Notess notess@imt.net imt.net/~notess/search
Overview: • Comparing the database content • Change • Comparative Size • Overlap • Looking towards future developments • Portal or Destination • Output sorting
Results are limited by • Database content • The Web sites included • The depth to which they are indexed
If it’s not in the database, the best search engine will not be able to find the Web page
So what’re they like? • Very large databases • Most index all words on page • None index words in images • Let’s see how the databases compare to the real Web
Overall Size Change Is the Web in general • Growing? • Shrinking? • Remaining the same?
What about the rest? • Who’s the biggest? • How to measure? • Actual search results • Verified hits
And over time? • 8/98 -- AltaVista, Northern Light, HotBot • 5/98 -- AltaVista, HotBot, Northern Light • 2/98 -- HotBot, AltaVista, Northern Light • 10/97 -- AltaVista, HotBot, Northern Light • 9/97 -- Northern Light, Excite, HotBot • 6/97 -- HotBot, AltaVista, Infoseek • 10/96 -- HotBot, Excite, AltaVista
Back to change in size • Let’s look at six search engines • Over the course of two years
But at least • They have a high degree of duplication between them • Right?
Try 4 small searches • Using five search engines • How many pages are found by all five or at least by four of them?
And they exclude most: • Content of Adobe PDF and formatted files • The content in most sites requiring a log in • CGI output: data requested by a form • Other dynamically produced data • Pages protected by a robots.txt file • Intranets, pages not linked from anywhere else • Commercial resources with domain limitations • Non-Web resources
Scope Summary: • Inconsistent growth • Not full coverage • Surprisingly low duplication
Positive Side? • Essential for searching the Net • Can be used effectively • Phrase search • Use more than one • Smart searching
Incredibly popular • Even when they fail • But then, since when is finding information always easy?
Overview: • Comparing the database content • Change • Comparative Size • Overlap • Looking towards future developments • Portal or Destination • Output sorting
What is a search engine? • Portal? • Gateway? • Destination?
Search Engine • the software than searches a database
Development • Database of Web pages • adds Supplementary Database • Phone numbers, reference, businesses, news • then adds Subject directory • then Services • email, ISP, shopping, travel agent • now Communities
Portal to Destination? • Driving force • advertising revenue • Keep users longer for more • Conflicts with portal and gateway principle
Future possibilities? • Smaller databases • Less pointing to external pages • Paid advertising or sponsorship for visibility • Rise of search only sites?
Output Development • Initially, “Relevance” ranking • Crude • Not site or URL based • Some site sorting from Excite • No date sorting
Site Sorting • Infoseek, then Lycos, now HotBot • Group together by site • More relevant than prior algorithms • Northern Light includes it in • Custom Folders
Other Output • RealName on AltaVista • Direct Hit on HotBot • Subject Directory Categories • News • Books, CDs, etc. “about search term”
Search Engine Showdown • imt.net/~notess/search • Search engine features • See also • www.searchenginewatch.com • See also • Rich Wiggins, Coming up next . . .
Web Search Engines by Greg R. Notess notess@imt.net imt.net/~notess/search