170 likes | 450 Views
Search Engines. Search Engines. Search engines take massive amounts of text and index them, so you can quickly and easily find key words Special Example archives USENET posts; Portal search engines archive web pages; Local search engines are limited to that site Travelocity.com Meta
E N D
Search Engines • Search engines take massive amounts of text and index them, so you can quickly and easily find key words • Special • Example archives USENET posts; • Portal • search engines archive web pages; • Local • search engines are limited to that site • Travelocity.com • Meta • Queri other search engines
Web Search Engines Examples • Altavista (www.altavista.com) • lycos.com • Excite.com • yahoo.com (Actually indexed/filed by humans) • google.com • dogpile.com (meta-search engine)
Search Engines • How they work User Browser Or Meta-search engine Queri User Interface Filtered database Searcher Evaluator Indexed Database indexer Raw database Web pages Gatherer(spider)
Meta Search Engines • How they work User browser User Interface Filtered database Searcher Evaluator Raw response Search Engine Search Engine Web pages Search Engine
Key Word Search User Browser User Interface Filtered database • search options • Match all words(and) • Match any words(or) • Mach whole phrase Evaluator Sort Criteria *Key Word Frequency *Key Word in Title *Meta tag *number of key words *Proximity to Start *Number of page references to this page Searcher Indexed Database
Web Search Engines • Search engines usually work via “spiders” • A spider visits a site and downloads a web page. It indexes that page, then looks for any pages that are linked from it, and repeats the process • Use the <meta> tags key words , description
Web Search Strategies 1 Each page is downloaded, indexed, and any links followed 2 linked5.html linked1.html linked2.html 3 myPage.html 6 linked3.html linked6.html 4 7 linked4.html Depth-first Search Breadth-First Search linked7.html 5
Web Search Engines If web pages are not linked you may get independent “trees” that are not indexed wine.html beer.html laFite.html Oly.html Ranier.html
Percentage Indexed? • Maybe 10-15% of web pages are indexed • Hidden web • Not accessible freely to public • Some sites disallow spiders/bots • Load issues • Content ownership issues • Site traffic pattern issues
Getting Indexed • You can submit a page to a search engine to be indexed; See search engine instructions; usually a link off the main page • http://www.google.com/intl/en/about.html • You can see the spiders visit in the logs: • Of the server hosting the web pages • They seem to visit about once a week • You can configure your server to refuse bots • Search engines can be out of date!
Search for "search engine submission" • http://www.wpromote.com/ - 50$/mo submission service • www.addme.com/ - free and for pay submission With this service your site will be optimized, submitted, and monitored to achieve top ranking in the most popular search engines: 20 guaranteed top 20 ranking. • http://www.spider-food.net/
Examples of Special Search Sites • Thomas.loc.gov • Bills, congressional record, reports • uscode.house.gov • www.leginfo.ca.gov • www.fas.org • www.janes.com
Media on the Net • Papers: • www.washingtonpost.com, nytimes.com mercurynews.com, latimes.com • Networks: • cnn.com, abcnews.com, nbc.com, cbs.com
English & non-English • Just about everything is in English on the web • You can get an approximate (very!) translation with babelfish at altavista • Sometimes very entertaining
Maps • Driving maps at yahoo.com • Satellite photos at www.terraserver.microsoft.com • USGS & Russian, can go to 1 m resolution; 1 terabyte of data • (They did it because it was big. All transactions on the NYSE in history are 0.5 terabytes)
Summary • Search Engines use “spiders” to index text and build local data bases • Indexing of Your material can be controlled by HTML <META>, Word placement in text, Page submission, • Meta Search engines, Search engines, Portals, subject matter pages …>more special • Beware of commercial motive , most of the Web is not indexed • Other services News, stock, *.NET initiative Map, weather, etc.