1 / 26

Statistical Machine Translation Models for Personalized Search

Statistical Machine Translation Models for Personalized Search. Rohini U AOL India R&D, Bangalore India Rohini.uppuluri@corp.aol.com Vamshi Ambati Language Technologies Institute Carnegie Mellon University Pittsburgh, USA vamshi@cs.cmu.edu Vasudeva Varma,

atanya
Download Presentation

Statistical Machine Translation Models for Personalized Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Machine Translation Models for Personalized Search Rohini U AOL India R&D, Bangalore India Rohini.uppuluri@corp.aol.com Vamshi Ambati Language Technologies Institute Carnegie Mellon University Pittsburgh, USA vamshi@cs.cmu.edu Vasudeva Varma, SIEL, LTRC, IIIT Hyderabad, India vv@iiit.ac.in

  2. Agenda • Introduction • Related Work • Background • User Profile as Translation Model • Personalized Search • Learning User Profile • Re-ranking • Experiments • Conclusions and Future Work

  3. Introduction • Current Web Search engines • Provide users with documents “relevant” to their information need • Issues • Information overload • To cater Hundreds of millions of users • Terabytes of data • Poor description of Information need • Short queries - Difficult to understand • Word ambiguities • Users only see top few results • Relevance • subjective – depends on the user One size Fits all ???

  4. Continued.. • Search is not a solved problem! • Poorly described information need • Java – (Java island / Java programming language ) • Jaguar – (cat /car) • Lemur – (animal / lemur tool kit) • SBH – (State bank of Hyderabad/Syracuse Behavioral Health care) • Given prior information • I am into biology – best guess for Jaguar? • past queries - { information retrieval, language modeling } – best guess for lemur?

  5. Review of Personalized Search Personalized Search Query logs Machine learning Language modeling Community based Others

  6. Statistical Language Modeling based Approaches: Introduction • Statistical language modeling : task of estimating probability distribution that captures statistical regularities of natural language • Applied to a number of problems – Speech, Machine Translation, IR, Summarization

  7. Statistical Language Modeling based Approaches: Background Lemur Query Formulation Model Query Given a query, which is most likely to be the Ideal Document? P(Q/D) = P(q1….qn/D) = ΠP(qi/D) User Information need Ideal Document In spite of the progress, not much work to capture, model and integrate user context !

  8. Noisy Channel based approach Motivation Query Generation Process (Noisy Channel) Ideal Document Retrieval Query Generation Process (Noisy Channel)

  9. Similar to Statistical Machine Translation • Given an english sentence translate into french • Given a query, retrieve documents closer to ideal document Noisy channel 1 English Sentence French Sentence P(e/f) Noisy Channel 2 Ideal Document Query P(q/w)

  10. Learning user profile • User profile: Translation Model Triples : (qw,dw,p(qw/dw)) • Use Statistical Machine Translation methods • Learning user profile training a translation model • In SMT: Training a translation model • From Parallel texts • Using EM algorithm

  11. Learning User profile • Extracting Parallel Texts • From Queries and corresponding snippets from clicked documents • Training a Translation Model • GIZA++ - an open source tool kit widely used for training translation models in Statistical Machine Translation research.

  12. Sample user profile

  13. Reranking • Recall, in general LM for IR • Noisy Channel based approach P(Q/D) = Π P(qi/D) lemur P(lemur/retrieval) Lemur encyclopedia … brief … Lemur toolkit … information retireval … Lemur - Encyclopedia gives a brief description of the physical traits of this animal. The Lemur toolkit for language modeling and information retrieval is documented and made available for download. D1 : D4:

  14. Experiments • Performed evaluation on explicit feedback data collected from 7 users • Experiments • Comparison with Contextless Ranking • Comparison between different training models and contexts

  15. Data and Set up • Data • Explicit Feedback data collected from 7 users • For each query, each user examined top 10 documents and identified top 10 documents • Collected the top 10 results for all queries. Total documents 3469 documents • Set up • 3469 documents - created lucene index. • For reranking, first retrieve the results using lucene and then rerank them using the noisy channel approach. • We perform 10 fold cross validation

  16. Data

  17. Metrics • Precision@n • Number of documents relevant / n

  18. Set up User Profile Learner Train Data User Profiles Data Test Data Reranker Reranked Results

  19. Results

  20. Results I - Document Training and Document Testing II - Document Training and Snippet Testing III - Snippet Training and Document Testing IV - Snippet Training and Snippet Testing

  21. Conclusions and Future Work • Proposed a stat MT based approach for modeling user model • Captures Richer context, relations between q and w. • In future, • N-gram based method : trigrams etc • Noisy Channel based method : bigram

  22. Questions?

  23. Thank you

  24. References • Adam Berger and John D. Lafferty. 1999. Information retrieval as statistical translation. In Research and Development in Information Retrieval, pages 222–229. • Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 19(2):263–311. • W. Bruce Croft, Stephen Cronen-Townsend, and Victor Larvrenko. 2001. Relevance feedback and personalization: • A language modeling perspective. In DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries. • Jamie Allan et. al. 2003. Challenges in information retrieval language modeling. In SIGIR Forum, volume 37 Number 1. • K. Sugiyama K. Hatano and M. Yoshikawa. 2004. Adaptive web search based on user profile constructed without any effort from users. In Proceedings of WWW 2004, page 675 684. • Victor Lavrenko and W. Bruce Croft. 2001. Relevance-based language models. In Research and Development in Information Retrieval, pages 120–127. • F. Liu, C. Yu, and W. Meng. 2002. Personalized web search by mapping user queries to categories. In Proceedings of the eleventh international conference on Information and knowledge management, ACM Press, pages 558–565. • Tom Mitchell. 1997. Machine Learning. McGrawHill.

  25. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19–51. • Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In Research and Development in Information Retrieval, pages 275–281. • A. Pretschner and S. Gauch. 1999. Ontology based personalized search. In ICTAI., pages 391–398. • J. J. Rocchio. 1971. Relevance feedback in information retrieval, the smart retrieval system. Experiments in Automatic Document Processing, pages 313–323. • G. Salton and C. Buckley. 1990. Improving retrieval performance by relevance feedback. Journal of the American Society of Information Science, 41:288–297. • Xuehua Shen, Bin Tan, and Chengxiang Zhai. 2005. Implicit user modeling for personalized search. In Proceedings of CIKM 2005. • F. Song and W. B. Croft. 1999. A general language model for information retrieval. In Proceedings on the 22nd annual international ACM SIGIR conference, page 279280. • Micro Speretta and Susan Gauch. 2004. Personalizing search based on user search histories. In Thirteenth International Conference on Information and Knowledge Management (CIKM 2004). • Chengxiang Zhai and John Lafferty. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of ACM SIGIR’01, pages 334–342.

More Related