Interactive Retrieval Based on Faceted Feedback

Interactive Retrieval Based on Faceted Feedback (SIGIR 10’)Lanbo Zhang, Yi Zhang 2010/09/06 Spoker: Hsu,Yu-wen

Outline • Introduction • Faceted Feedback • Facet-value pair recommendation • Incorporate faceted feedback into retrieval • Experimental Methodology • Experimental Results • Conclusions

Introduction • A personalized search or filtering system usually suffers from the “cold start” problem. • to borrow information from other users. • to develop user interaction mechanisms to collect more information from users interactive user feedback mechanism • learn more about user information needs with limited user interactions

documents have their own facets manually assigned generated automatically Users’ preferences provide structured queries to describe their information needs but not frequently , sometimes incorrectly Faceted search

faceted feedback mechanism : • a simple interactive user feedback mechanism • based on document facets • faceted constraints • in the form of facet-value pairs • users can choose interesting facet-value pairs to improve the returned documents.

Two major problems • the candidates of facets and possible values for products are usually manually designed • we investigate four approaches to recommending good facet-value pairs • Existing e-commerce websites often use a Boolean filtering strategy while retrieving products. • we propose a soft retrieval model, a document that meets a users elected faceted constraint gets a certain number of credits

Faceted Feedback • Facet-value pair recommendation • each metadata field is called a facet, and a facet with a specific value is called a facet-value pair language: Chinese format: ppt subject: IR genre: comedy

candidate 1 baseline retrieval algorithm initial query 2 3 4 ranked document K Document 1 Document 2 Document 3 Document N *Top Document Frequency (TDF) • assumption: the more frequently a facet-value pair appears in the top ranked documents, the more likely the user will like it.

*TDF-IDF • :a facet-value pair • :the top document frequency of for query

*Query Likelihood (QL) • :the frequency of in the query • :a translation model of document • : the whole corpus • :assumed to be uniform over all documents that contain • The facet-value pairs with the largest query likelihoods are chosen as the candidates.

*TDF-QL • :to normalize the features • :the set of scores of all considered facet-value pairs.

Incorporate faceted feedback into retrieval : the set of F-V pairs chosen by the user. • Score documents by Boolean Model • return a document set • : the score of document computed using a baseline ranking method . • method: TFIDF,BM25 etc.

Score document by Soft Model • : the weight of facet learned automatically • :the original score of document • :the standard normalization

Experimental Methodology • Datasets • OSHUMED dataset • 348,566 medical articles from 270 medical journals • topics  user information needs • metadata field MeSH  facets • RCV1(Reuters Corpus Volume 1) • 810,000 Reuters news stories published • topic, geographical region, industry facets • the first 50 topics of TREC 2002 track  user information needs (Medical Subject Headline)

*Evaluation Based on Mechanical Turk • 3 workers work on each query.

*Experimental settings • Compare with • baselineretrieval method: BM25, without feedback • pseudo relevance feedback (PRF) • real document relevance feedback (RRF) • evaluate performance • Mean Average Precision (MAP) • Precision@N (P@N) (the precision of top N documents) • Recall@N (R@N) (the recall of top N documents) • Set • =10 • =100 • =0.5

Experimental Results • Overall Performances of faceted feedback • 1,2,3 for OHSUMED; 4,5,6 for RCV1 • Soft Model

*Boolean model V.S. Soft model OHSUMED,user1 RCV1,user6 3-fold cross validationlearn the parameter

RCV1 OHSUMED Performances of different facet-value pair recommendation approaches. PRF@5: pseudo relevance feedback using top 5 docs RRF@5: real document-based relevance feedback using top 5 docs.

Conclusions • We researched the user feedback mechanism based on faceted document metadata. • Boolean model is inappropriate for metadata-based general purpose document retrieval • Soft model is more effective on both datasets, as it automatically learns a weight for each facet, which captures the facet quality.

what is IDF? • 逆向文件頻率（inverse document frequency，IDF）是一個詞語普遍重要性的度量。某一特定詞語的IDF，可以由總文件數目除以包含該詞語之文件的數目，再將得到的商取對數得到： • ex: “mining”出現在100份文件中，整個文件集文件總數1000，則IDF= log(1000/100) back

What is BM25? • BM(Best Match)，用來做搜索的相關度評分，即為[給定搜索内容]Q在[給定文件]D中的相關程度，分數越高表示相關度越高。 back

What is MAP? • MAP(Mean Average Precision)：單個主題的平均準確率是每篇相關文件檢索出後的準確率的平均值。主集合的平均準確率(MAP)是每個主題的平均準確率的平均值。 MAP 是反映系統在全部相關文件上性能的單值指標。系統檢索出來的相關文件越靠前(rank 越高)，MAP就可能越高。如果系統沒有返回相關文件，則準確率默認為0。 • EX: 假設有兩個主題，主題1有4個相關網頁，主題2有5個相關網頁。某系統對於主題1檢索出4個相關網頁，其rank分別為1, 2, 4, 7；對於主題2檢索出3個相關網頁，其rank分別為1,3,5。對於主題1，平均準確率為(1/1+2/2+3/4+4/7)/4=0.83。對於主題2，平均準確率為(1/1+2/3+3/5+0+0)/5=0.45。則MAP= (0.83+0.45)/2=0.64。 ” back

new ranked Document 1 Document 2 Document 3 Document N Pseudo Relevance feedback ranked document Document 1 Document 1 Document 2 Document 2 baseline retrieval algorithm initial query Document K Document K Document N top K認定為relevance back

new ranked Document 1 Document 2 Document 3 Document N Real Document Relevance feedback ranked document Document 1 Document 1 Document 2 Document 2 baseline retrieval algorithm initial query Document K Document K Document N 使用者自己決定哪些是relevance back

Interactive Retrieval Based on Faceted Feedback