Web Information retrieval (Web IR)

Web Information retrieval (Web IR) Handout #13: Ranking based on User Behavior Ali Mohammad Zareh Bidoki ECE Department, Yazd University alizareh@yaduni.ac.ir

Finding Ranking Function • R=f( Query, User behavior, web graph & content features) • How can we use the user behavior? • Explicit • Implicit • 80% of user clicks are related to query • Click-through data • From search Engines log

Click-through data Triple (q,r,c) q=query r=ranked list c=set of clicked docs c q r Click-through data (by Joachims )

Benefits of Using Click through data • Democracy in Web • Filling gap between user needs and results • User clicks are more valuable that a page content (Search engine precision is evaluated by user no page creators) • Degree of relevancy between query and documents will increase (Adding click metadata to document)

Docs Docs Words Queries Users 1 1 1 1 1 2 2 2 2 2 n q n m w Web graph Web Entities

Document Expansion Using Click TD • First time Google used Anchortext as a document content • Anchor text is view of a document from another document

Long term incremental learning • Di vector of a document in ith iteration • Q is vector of the query that this document is clicked • Alpha is learning rate

Naïve Method (NM)A bipartite graph for docs and queries • Mij is number of clicks on document j for query i

Naïve Method (Cont.) • The weight between query qj and document di: • The meta data for document i is:

Co-Visited Method • If two pages are clicked by the same query they called co-visited. • The similarity between two docs i and j is (visited(di) shows number of clicks on di and visited(di,dj) shows number of queries in which both are clicked):

Co-Visited Disadvantages • It only considers documents similarity (not query similarity) • As users clicks on top 10 pages, click data are sparse (1.5 queries for each page) • So similarity is not precise

Iterative Method (IM) • O(q): set of clicked page for q • Oi(q): the ith clicked page for q • I(d): set of queries in which it is clicked on d • Ii(d): The ith query in which it is clicked on d

Experimental Results • Experimental results on a real large query click-through log, i.e. MSN query log data, indicate that the proposed algorithm relatively outperforms • the baseline search system by 157%, • naïve query log mining by 17% and • co-visited algorithm by 17% • on top 20 precision respectively.

Web Information retrieval (Web IR)