10 likes | 132 Views
Web Categorization Crawler Students: Mohammed Agabaria , Adam Shobash Advisor: Victor Kulikov. “A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion”. Crawler Overview:
E N D
Web Categorization Crawler Students: Mohammed Agabaria, Adam Shobash Advisor: Victor Kulikov “A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion” • Crawler Overview: • The Crawler starts with a list of URLs to visit, called the seeds list • The Crawler visits these URLs and identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the frontier • URLs from the frontier are recursively visited according to a predefined set of policies “The main role of the categorizer is to classify the current fetched page to the categories defined by the user. The crawler passes through all the categories, and determines to which categories the page is ascribed” Internet Downloading … Done Matching … Done Get scheduled URL Fetch from the Internet Crawl task Categorize the page Crawler Server Category A Category B Category C “This project deals with Implementation of multi-threaded Web Categorization Crawler, which consists of fetching pages from the internet, extracting all the hyperlinks from the fetched page, ranking every link depending on the relevance of the link. Every page then is categorized and the results are saved in the database”