1 / 13

Chapter 5 Web Crawler & Search Engine

Chapter 5 Web Crawler & Search Engine . I. What is Web Crawler. A  Web crawler  is an  Internet bot  that systematically browses the  World Wide Web , typically for the purpose of  Web indexing (store in database easy to search).

gerry
Download Presentation

Chapter 5 Web Crawler & Search Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 5 Web Crawler & Search Engine

  2. I. What is Web Crawler • A Web crawler is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (store in database easy to search). • Web search engines and some other sites use Web crawling or spidering software to update their web content or indexes of others sites' web content. Web crawlers can copy all the pages they visit for later processing by a search engine that indexes the downloaded pages so that users can search them much more quickly. • Crawlers can validate hyperlinks and HTML code. They can also be used for web scraping (see also data-driven programming).

  3. 1. What is Web Crawler (con.) • A Web crawler starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. • URLs from the frontier are recursively visited according to a set of policies. If the crawler is performing archiving of websitesit copies and saves the information as it goes. Such archives are usually stored such that they can be viewed, read and navigated as they were on the live web, but are preserved as ‘snapshots'

  4. 2. How Crawler works? • Tovisitpage : URL Queue • visitedpage : Extract

  5. Software of Crawler

  6. Process of Web Crawler with Search Engine

  7. Basic Website Layout

  8. 3. Technology for Web Crawler • A

  9. 4. Web Crawler Search Engine • A

  10. 5. How Web Crawler Search Engine works?

  11. 6. Money behind Search Engine • A

More Related