1 / 25

Interactive Retrieval Based on Faceted Feedback

This study explores the use of faceted feedback in personalized search systems and proposes a mechanism for users to provide feedback based on document facets. The study also investigates different approaches to recommending facet-value pairs and compares the performance of Boolean and soft retrieval models.

jparmer
Download Presentation

Interactive Retrieval Based on Faceted Feedback

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interactive Retrieval Based on Faceted Feedback (SIGIR 10’)Lanbo Zhang, Yi Zhang 2010/09/06 Spoker: Hsu,Yu-wen

  2. Outline • Introduction • Faceted Feedback • Facet-value pair recommendation • Incorporate faceted feedback into retrieval • Experimental Methodology • Experimental Results • Conclusions

  3. Introduction • A personalized search or filtering system usually suffers from the “cold start” problem. • to borrow information from other users. • to develop user interaction mechanisms to collect more information from users interactive user feedback mechanism • learn more about user information needs with limited user interactions

  4. documents have their own facets manually assigned generated automatically Users’ preferences provide structured queries to describe their information needs but not frequently , sometimes incorrectly Faceted search

  5. faceted feedback mechanism : • a simple interactive user feedback mechanism • based on document facets • faceted constraints • in the form of facet-value pairs • users can choose interesting facet-value pairs to improve the returned documents.

  6. Two major problems • the candidates of facets and possible values for products are usually manually designed • we investigate four approaches to recommending good facet-value pairs • Existing e-commerce websites often use a Boolean filtering strategy while retrieving products. • we propose a soft retrieval model, a document that meets a users elected faceted constraint gets a certain number of credits

  7. Faceted Feedback • Facet-value pair recommendation • each metadata field is called a facet, and a facet with a specific value is called a facet-value pair language: Chinese format: ppt subject: IR genre: comedy

  8. candidate 1 baseline retrieval algorithm initial query 2 3 4 ranked document K Document 1 Document 2 Document 3 Document N *Top Document Frequency (TDF) • assumption: the more frequently a facet-value pair appears in the top ranked documents, the more likely the user will like it.

  9. *TDF-IDF • :a facet-value pair • :the top document frequency of for query

  10. *Query Likelihood (QL) • :the frequency of in the query • :a translation model of document • : the whole corpus • :assumed to be uniform over all documents that contain • The facet-value pairs with the largest query likelihoods are chosen as the candidates.

  11. *TDF-QL • :to normalize the features • :the set of scores of all considered facet-value pairs.

  12. Incorporate faceted feedback into retrieval : the set of F-V pairs chosen by the user. • Score documents by Boolean Model • return a document set • : the score of document computed using a baseline ranking method . • method: TFIDF,BM25 etc.

  13. Score document by Soft Model • : the weight of facet learned automatically • :the original score of document • :the standard normalization

  14. Experimental Methodology • Datasets • OSHUMED dataset • 348,566 medical articles from 270 medical journals • topics  user information needs • metadata field MeSH  facets • RCV1(Reuters Corpus Volume 1) • 810,000 Reuters news stories published • topic, geographical region, industry facets • the first 50 topics of TREC 2002 track  user information needs (Medical Subject Headline)

  15. *Evaluation Based on Mechanical Turk • 3 workers work on each query.

  16. *Experimental settings • Compare with • baselineretrieval method: BM25, without feedback • pseudo relevance feedback (PRF) • real document relevance feedback (RRF) • evaluate performance • Mean Average Precision (MAP) • Precision@N (P@N) (the precision of top N documents) • Recall@N (R@N) (the recall of top N documents) • Set • =10 • =100 • =0.5

  17. Experimental Results • Overall Performances of faceted feedback • 1,2,3 for OHSUMED; 4,5,6 for RCV1 • Soft Model

  18. *Boolean model V.S. Soft model OHSUMED,user1 RCV1,user6 3-fold cross validationlearn the parameter

  19. RCV1 OHSUMED Performances of different facet-value pair recommendation approaches. PRF@5: pseudo relevance feedback using top 5 docs RRF@5: real document-based relevance feedback using top 5 docs.

  20. Conclusions • We researched the user feedback mechanism based on faceted document metadata. • Boolean model is inappropriate for metadata-based general purpose document retrieval • Soft model is more effective on both datasets, as it automatically learns a weight for each facet, which captures the facet quality.

  21. what is IDF? • 逆向文件頻率(inverse document frequency,IDF)是一個詞語普遍重要性的度量。某一特定詞語的IDF,可以由總文件數目除以包含該詞語之文件的數目,再將得到的商取對數得到: • ex: “mining”出現在100份文件中,整個文件集文件總數1000,則IDF= log(1000/100) back

  22. What is BM25? • BM(Best Match),用來做搜索的相關度評分,即為[給定搜索内容]Q在[給定文件]D中的相關程度,分數越高表示相關度越高。 back

  23. What is MAP? • MAP(Mean Average Precision):單個主題的平均準確率是每篇相關文件檢索出後的準確率的平均值。 主集合的平均準確率(MAP)是每個主題的平均準確率的平均值。 MAP 是反映系統在全部相關文件上性能的單值指標。 系統檢索出來的相關文件越靠前(rank 越高),MAP就可能越高。 如果系統沒有返回相關文件,則準確率默認為0。 • EX: 假設有兩個主題,主題1有4個相關網頁,主題2有5個相關網頁。 某系統對於主題1檢索出4個相關網頁,其rank分別為1, 2, 4, 7;對於主題2檢索出3個相關網頁,其rank分別為1,3,5。 對於主題1,平均準確率為(1/1+2/2+3/4+4/7)/4=0.83。 對於主題2,平均準確率為(1/1+2/3+3/5+0+0)/5=0.45。 則MAP= (0.83+0.45)/2=0.64。 ” back

  24. new ranked Document 1 Document 2 Document 3 Document N Pseudo Relevance feedback ranked document Document 1 Document 1 Document 2 Document 2 baseline retrieval algorithm initial query Document K Document K Document N top K認定為relevance back

  25. new ranked Document 1 Document 2 Document 3 Document N Real Document Relevance feedback ranked document Document 1 Document 1 Document 2 Document 2 baseline retrieval algorithm initial query Document K Document K Document N 使用者自己決定哪些是relevance back

More Related