270 likes | 576 Views
The employ special software robots, called spiders, to crawl web pages ... Works like a search engine rather than a directory. Searches the web ...
E N D
Slide 1:How Search Engines Work General Search Strategies
Dr. Dania Bilal IS 587 SIS Fall 2007
Slide 2:Fun Quiz
Take the search engine quiz located at http://websearch.about.com/library/quizzes/search_engine_quiz/blsearchenginequiz.htm Record the no. of incorrect answers Share the results of the quiz with a classmate.
Slide 3:How Search Engines Work?
They collect information from selected web sites The employ special software robots, called spiders, to crawl web pages Spiders build lists of the words found in Web sites. When a spider is building its lists, the spider is Web crawling. Spiders store the lists in the engine’s database The engine’s indexing software builds an index of words Information is matched against query input and retrieved (processing algorithm)
Slide 4:How Spiders and Crawlers Work?
They begin with popular and heavily used web servers. They begin with a popular site, collect the words on its pages and follow every link found within the site. Spiders travel across pages and the most widely used portions of the Web
Slide 5:How Spiders and Crawlers Work?
A dedicated server of URLs is built by a search engine company (e.g., Google) so that spiders collect information quickly More than one spider is used to craw web pages at a time Google uses 3-4 spiders and collect over 100 pages per second
Slide 6:How Spiders and Crawlers Work?
When no dedicated URL server is used, search engine company relies on ISP for the domain names (translated into addresses) to use for crawling the web Delay in gathering information Delay in updating information Lack of control over URL addresses
Slide 7:Google Spider and How it Works
A spider looks at the html or xml or other coding used to build a web page and collects information from the meta-tags It indexes words within the actual text of a page It indicates where the words were found (URL, title, headings, etc.) It disregards initial articles It disregards pages that should not be crawled or indexed
Slide 8:Google Spider and How it Works
It uses Robot-Exclusion Protocol in disregarding pages Implemented in the meta-tag section at the beginning of a Web page Tells a spider to leave the page alone, neither index the words on the page nor try to follow its links Franklin, C. How Internet Search Engines Work. http://computer.howstuffworks.com/search-engine.htm
Slide 9:How Search Engines Store Words Indexed?
The process varies among engines Words are stored with no. of times they appear on a pages (posting) Weight is assigned to each word. Words appearing near top of a page may have more weight than those appearing in subheadings, in links, in meta tags, in title, etc.
Slide 10:How Search Engines Store Words Indexed?
Information is encoded to save space Information is indexed An index of words is built by the automatic indexer (indexing software) A hash table is created with an assigned weight or value for each word indexed Hashing allows for even the distribution of popular entries (e.g., letter M) with those that are less popular (e.g., letter X) for quick retrieval
Slide 11:Using General Directories
Yahoo and its family Browsing directory Directory database Small and human-selected and indexed Searching using keywords Search database Larger and non-selective database Spider and machine indexing
Slide 12:Yahoo
Yahoo.com Works like a search engine rather than a directory Searches the web Exercise: search under my name and see how Yahoo processes query while you’re inputting information Directory found under more or at http://search.yahoo.com/dir
Slide 13:Yahoo Search Engine
Search Web Images Videos Local information Shopping More…
Slide 14:Yahoo Advanced Search
Advanced Search feature Shown on screen after you perform a search, or by going directly to http://search.yahoo.com/web/advanced?ei=UTF-8&p=dr+dania+bilal&fr=yfp-t-471 Lots of search features to explore
Slide 15:Yahoo Advanced Search Features
Boolean Phrase Currency Domain File format Country Language Other
Slide 16:Yahoo Advanced Search Features
Exercise Perform a search on a topic of your choice Use Boolean equivalents All the words=AND The exact phrase=phrase; proximity search Any of these words=OR None of these words=Not Choose part of page to search Choose language other than English Report results in class
Slide 17:Yahoo Search Services
For searching specific content area such as Search Services Web SearchFind anything from across the Web AnswersAsk questions and get answers from real people Audio SearchFind over 50mm audio files from across the Web Creative Commons SearchFind Creative Commons content that you can share or re-use in your own works Directory SearchSearch or browse Yahoo!'s categorized guide to the Web Image SearchFind over 1.6 Billion photos and illustrations from all over the Web Job SearchSearch for jobs, post your resume and more on Yahoo! HotJobs LocalFind everything in your area from dry cleaners to day spas MapsFind maps and driving directions for anywhere you want to go Mobile SearchFind whatever, wherever you are My Web (Beta)The newest way to save, share and organize any page you want on the Web News SearchSearch for news stories and related photos, videos and audio clips
Slide 18:Yahoo Next
http://next.yahoo.com/ Cutting edge technology at Yahoo Blogs, Web 2.0, use of alltheweb, Yahoo Maps, Podcasts, audio and all other features that are in Beta testing
Slide 19:Yahoo Preferences
Customize Yahoo to fit your needs Go to Preferences from the Web search page Edit preferences based on your needs Edited preferences are saved in browser on desktop
Slide 20:General Search Strategies in Search Engines
Slide 21:Strategies
Boolean Boolean equivalents Proximity and phrase searching Searching within a field Search limits
Slide 22:Yahoo Search Strategies
Explore Yahoo’s help page Read the Search Tips Read the search limit parameters such as Intitle: url: inurl: Read how to use Boolean equivalents and other search parameters
Slide 23:General Search Engines Besides Yahoo Search
Slide 24:Engines and Information Need
Several general search engines on the Web Select engine(s) that best fit your need Visit the Web Search Guide for latest information: http://websearch.about.com/od/generalsearchengines/General_AllPurpose_Search_Engines.htm
Slide 25:Hands-on Activity
Browe the list of general search engines in Web Search Guide Explore 4 of the engines listed Wisenut, Snap.com, Lycos, Exalead Search under my name in each engine Compare the results by viewing the first two pages retrieved How many overlaps were found among the three engines How many unique results were found in each engine
Slide 26:Specialized Search Engines
Web Search Guide has a listing of specialized search engines Web companion to the textbook, chapter 3 describes a variety of specialized engines Explore chapter 3 familiarize yourself with the engines described
Slide 27:Hands-on Activity
Find the answer or relevant information for these two queries using an appropriate, specialized search engine: Do squirrels hybernate? Find me a list of foreign-owned companies based in the U.S., organized by state.