470 likes | 753 Views
Search Engines. HIMA 4160 Fall 2008. Agenda. Components of Search Engine Boolean Search Advance Features of Google. What is web search engine?. Why do we need search engine?. Which search engine do you usually use to search information online?.
E N D
Search Engines HIMA 4160 Fall 2008
Agenda • Components of Search Engine • Boolean Search • Advance Features of Google
Which search engine do you usually use to search information online?
What are in the deep web? • Dynamic webpages • Un-indexed webpages • Frequently updated webpages. • Facebook etc.
Do you think your personal page can be searched on Google? • Yes • No • Maybe
Send your website to Google http://www.google.com/addurl
Search Engines • Crawler • Indexer • Query processor
Search Engines Indexer Indexed webpage saved in database Web Crawler Fetch webpages from the Internet Query processor
Web Crawler • A computer program that can send http request to various websites • Can follow the hyperlinks to continuously visit WebPages • Similar to a web browser • Also called spider or web robot.
How to build the Index • Robot – computer program to visit from webpage to webpage through hyperlinks. • You can also submit an URL to Google • After visit a page, a snap shot is store on the search engine’s server • Index file is created – words, frequency, URL and other useful information
Boolean • AND • OR • NOT • female AND student • female OR student • NOT female • (NOT female) AND (NOT student)
Let’s tweak it (Google) • health information management • health AND information AND management • health OR information OR management • “health information management”
Basic Google search features • Choose search term • Usually 1 – 3 words • Capitalization • NOT case sensitive • Automatic “and” query • Health Information Management = Health AND Information AND Management • Automatic exclusion of common words • Where, when, in, single digits, single numbers … …
Basic Google Search Features • Word variation • Diet, dietary • Phrase search • “Health Information Management” • Negative terms • bass -music • I am feeling lucky • First thing in the list
Advanced Google Search • Site search • health information management site:ecu.edu • Filetype search • Health information management filetype:pdf • “+” search • Star Wars Episode + I • Synonym search • ~food ~fact
Advanced Google Search Features • “OR” search • vocation London OR Paris • Number range search • DVD player $100..$200 • Wild card search • East * university
Advanced Google Operator • cache: • cache:www.ecu.edu • link: • link:www.ecu.edu • related: • related:www.ecu.edu • info: • info:www.ecu.edu
Advanced Google operator • define: • define:health • stocks: • stocks:GOOG • allintitle: • allintitle: health information management • intitle: • intitle:health information management • allinurl: • allinurl:ecu hsim • Inurl: • Inurl:ecu hsim
Other things Google can do • Converter • Calculator • Translator • Scholar • Shopper • Usher • Map • Instant messaging • Switchboard • Many more … … http://www.pcmag.com/article2/0,1895,1858681,00.asp.
Additional Information • http://douweosinga.com/
Rank – how do you know which link is important? • Before Google • Conventional information retrieval • After Google • Link analysis
Google Bomb • “Miserable failure” • Defused by Google in 2006 due to algorithm changes
Google’s business model • How to make money from search? • Google’s business model • Click stream • Pay by clicks
Google become the Information Portal • Google become the portal to information of the world • How scary is it? • “Do no evil”
Tips for Efficient Search • Be clear about what sort of page you seek • Think about what type of organization might publish the page you want • List terms that are likely to appear on the pages you are looking for • Assess the results • Consider a two pass strategy
Summary • Three parts of search engine • How indexer work • How query processor work