80 likes | 202 Views
Challenge in Web Imformation Retrieval (2004). 1.Imformation Retriveval on the Web. Identify the quality of pages PageRank and HITS Variants of they Anchor text An indication of the context of the web page. 1.Imformation Retriveval on the Web.
E N D
1.Imformation Retriveval on the Web • Identify the quality of pages • PageRank and HITS • Variants of they • Anchor text • An indication of the context of the web page
1.Imformation Retriveval on the Web • Adersarial Classification: Dealing with Spam on the Web • Text spam and link spam • Adversarial classification • Evaluating Search Results • TREC • Using chickthrough data • Principled automate means for large-scale evaluation of ranking result
2.Using the Web to Create “Kernels” of Meaning • Relateness of fragments of text • A real-valued kernal function K(x,y) • Utilize external resources ,such as SE • Query expansion ,QE(x) and QE(y) • Compute the cosine between QE(x) and QE(y) • Open research issue • Effictive algorithm for a certain tasks • Identify poor expansions
3.Retrival of UseNet Article • Newsgroups and documents • Compute the inherent quality of an author • Netscan project • Ranking methods
4.Retrival of Images and Sounds • Retrieve images and sounds • Content Dectection • Content similarity assessment • Using surrounding textual imformation • Other • Near-duplicate • Rank • Video-retrieval
5.Harnessing Vast Quantities of Data • Spell correction • Probabilistic context sensitive model for SC • “Mehran Sahami”“Tehran”“Mehran Salhami” • Query classification into the open directory project • Short and ODP • A variety of different approaches • Enough training data make up for weaker modeling techniques
附:Clallenge in SE(2002) • Spam • Content quality • Quality evaluating • Web conventions • Duplicat hosts • Vaguely-structured Data