1 / 8

Challenges in Web Search

Challenges in Web Search. Amit Singhal. Web Search. Crawl, Index, Search Crawl and Index freshness coverage (page selection, deep web) Search adversarial IR, trust evaluation partitioning the query space. Crawl and Index. Freshness pages are deleted, created, changed

mandel
Download Presentation

Challenges in Web Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges in Web Search Amit Singhal

  2. Web Search • Crawl, Index, Search • Crawl and Index • freshness • coverage (page selection, deep web) • Search • adversarial IR, trust • evaluation • partitioning the query space

  3. Crawl and Index • Freshness • pages are deleted, created, changed • How to keep the index fresh? • Coverage • which 2.5B pages to index? • lot of useful information in databases • How to index “hidden” content?

  4. Search • Adversarial IR • all useful signals are spammed

  5. Search • Trust • how much can we trust a site • an article hosted at BBC is much more trustworthy than the same article hosted at yet-another-news-company.com • How trustworthy is a site, and how to use this information in ranking?

  6. Search • Evaluation • the collection changes continuously • rel. pages become non-rel., and vice-versa • can’t easily freeze a copy • relevance is a function of rendering • need all images, all redirects, CSS, … • linkage characteristics change over time • query space is huge (over 150M/day) • most popular query: 0.037%, 10th most popular: 0.011% • need a very large query set, expensive • How to evaluate given changing collection and a very big query space?

  7. Search • Ranking in a huge query space • specific methods work well for specific query types • e.g strong proximity helps for people names • identify query type and use type-specific ranking algorithms • How to partition the query space into meaningful and useful partitions?

  8. Web Search • How to keep the index fresh? • How to index “hidden” content? • How trustworthy is a site, and how to use this information in ranking? • How to evaluate given changing collection and a very big query space? • How to partition the query space into meaningful and useful partitions? • It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts. Sir Arthur Conan Doyle(1859 - 1930)

More Related