1 / 26

Internet Resources Discovery (IRD)

Internet Resources Discovery (IRD). General Search Engines Development/Examples. Search Engines Generations. 1st Generation - Basic SEs: 2nd Generation - Meta SEs: 3rd Generation - Popularity SEs:. 1st Generation SEs. Basic data about websites on which queries are being executed.

taline
Download Presentation

Internet Resources Discovery (IRD)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Internet Resources Discovery (IRD) General Search Engines Development/Examples

  2. Search Engines Generations • 1st Generation - Basic SEs: • 2nd Generation - Meta SEs: • 3rd Generation - Popularity SEs: T.Sharon-A. Frank

  3. 1st Generation SEs • Basic data about websites on which queries are being executed. • Directories including basic indices: general and special. • Website ranking based on page content. T.Sharon-A. Frank

  4. Vector Space Model • Representation of documents/queries - converted into vectors. • Vector features are words in the document or query, after stemming and removing stop words. • Vectors are weighted to emphasize important terms. • The query vector is compared to each document vector. Those that are closest to the query are considered to be similar, and are returned. T.Sharon-A. Frank

  5. Example of Computing Scores Document Related Part Document (d) w(t,d) Informationretrieval abstract. Meant to show how results are evaluated for all kinds of queries. There are two measures are recall and precision and they change if the evaluation method changes. Informationretrieval is important! It is used a lot for search engines that store and retrieve a lot of information, to help us search the World Wide Web. T.Sharon-A. Frank

  6. Example of Computing Scores * = Score = 300+300+20 = 620 T.Sharon-A. Frank

  7. Altavista’s Search Ranking • Prominence: The closer the keywords are to the start of the page or the start of a sentence (also title/heading/bottom). • Proximity: how close keywords are to each other. • Density and Frequency: • relationship (%) of keywords to other text. • number of times keywords occur within the text. T.Sharon-A. Frank

  8. 2nd Generation SEs • Using several SEs in parallel. • The results are filtered, ranked and presented to the user as a uniformed list. • The ranking is a combination of the number of sources each page appeared in, and the ranking in each source. T.Sharon-A. Frank

  9. Meta SE is a Meta-Service • It doesn’t use an Index/database of its own. • It uses other external search services that provide the information necessary to fulfill user queries. T.Sharon-A. Frank

  10. Meta Search Engine MetaCrawler Yahoo Web Crawler Open Text Lycos InfoSeek Inktomi Galaxy Excite T.Sharon-A. Frank

  11. Premises of MetaCrawler • No single search is sufficient. • Problem in expressing the query. • Low quality references can be detected. T.Sharon-A. Frank

  12. Search Service - Motivation • The number and variety of SEs. • Each SE provides an incomplete snapshot of Web. • Users are forced to try and retry their queries across different SEs. • Each SE has its own interface. • Irrelevant, outdated or unavailable responses. • There is no time for intelligence. • Each query is independent. • No individual customization. • The result is not homogenized. T.Sharon-A. Frank

  13. Problems • No advanced search options. • Using the lowest common denominator. • Sponsored results from the SEs are not highlighted. T.Sharon-A. Frank

  14. 3rd Generation SEs • Emphasis on many various services. • Higher quality. • Faster search. • Usually using mainly external ”out of page” information. • Better ranking methods. T.Sharon-A. Frank

  15. Google • Ranks websites according to the number of links from other pages. • Increases ranking based on the page characteristics (keywords). • Disadvantage: new pages will not appear in the results page, because it takes time to get linked (sandboxing). T.Sharon-A. Frank

  16. AskJeeves (Teoma) • Trying to direct the searcher exactly to the page answering the question. • When cannot find something suitable in its resource, directs to other sites using additional SEs. • Uses natural language interface. T.Sharon-A. Frank

  17. DirectHit (1) • Allow users, rather than search engines or directory editors, to organize search results. • Given a query answer, saves the websites that the users chose from the results page (websites list). • Over time, learns the popular pages for each query. T.Sharon-A. Frank

  18. DirectHit (2) • Calculate Click Popularity and Stickiness. • Click popularity is a measure of the number of clicks received by each site in the results page. • Stickiness is a measure of the amount of time a user spends at a site. It's calculated according to the time that elapses between each of the user's clicks on the search engine's results page. • Gives clicks for low-scoring sites more weight. T.Sharon-A. Frank

  19. Problems • New Web sites will not get high ranking, because most searchers enter a limited number of Web sites (usually the first three). • Spamming: • Programs that can search for a certain keyword, find a company's site and click on. • After remaining on the site for a specified amount of time, the program will go back and repeat the process. T.Sharon-A. Frank

  20. Problem: Quality/Reliability Tsunami? T.Sharon-A. Frank

  21. Some Evaluation Techniques • Who wrote the page • Use info: or look at “about us” • Check how popular/authoritative • When was it written. • Other indicators: • Why was the page written • References/bibliography • Links to other resources T.Sharon-A. Frank

  22. Tools for Checking Quality • Toolbars (Google, Alexa) • Backward links (Google) • PRSearch.net http://www.prsearch.net/inbx.php • Internet SEs FAQhttp://www.internet-search-engines-faq.com/find-page-rank.shtml T.Sharon-A. Frank

  23. Google and Alexa Toolbars T.Sharon-A. Frank

  24. Example: Google’s PageRank T.Sharon-A. Frank

  25. Alexa Toolbar Info T.Sharon-A. Frank

  26. References • http://www.searchengines.com/ranking_factors.html • http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/Evaluate.html • http://gateway.lib.ohio-state.edu/tutor/les1/index.html • http://www2.widener.edu/Wolfgram-Memorial-Library/webevaluation/webeval.htm T.Sharon-A. Frank

More Related