1 / 27

Topic-level Random Walk Through Probabilistic Model

Topic-level Random Walk Through Probabilistic Model. Zi Yang , Jie Tang, Jing Zhang, Juanzi Li, Bo Gao KEG, DCST Tsinghua University 4/4/2009. Search Engine. query. Keyword Matching. Document collection. Language model Vector space model. Match!. Sorted result. Random Walk. 0.25.

bob
Download Presentation

Topic-level Random Walk Through Probabilistic Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic-level Random Walk Through Probabilistic Model Zi Yang, Jie Tang, Jing Zhang, Juanzi Li, Bo Gao KEG, DCST Tsinghua University 4/4/2009

  2. Search Engine query Keyword Matching Document collection Language model Vector space model Match! Sorted result

  3. Random Walk 0.25 0.25 0.25 0.25 0.25 0.125 0.125 0.125 0.125 0.25

  4. Random Walk 0.1437 0.4625 0.25 0.1437

  5. Random Walk 0.1153 0.359 0.3426 Both accepted for high rank score 0.1831

  6. Topic-level Random Walk Importance score of the page for topics e.g. data mining and machine learning • Challenges • How to discover topics from both query and documents? • How to implement topic-level random walk? Both accepted for high topic-level rank score

  7. Outline • Related work • Our approach • Topic modeling • Topic-level random walk • Search with topics • Experimental results • Conclusion

  8. Related Work • Search with keywords • Language Model [Zhai, 01], VSM, etc. • Random walk • PageRank [Page, 99], HITS [Kleinberg, 99], • Topic-sensitive PageRank [Haveliwala, 02], Topical PageRank [Nie, 2006], etc. • Search with semantic topics • LSI [Berry,95], pLSI [Hofmann, 99], LDA [Blei,03] [Wei, 06], etc.

  9. Outline • Related work • Our approach • Topic modeling • Topic-level random walk • Search with topics • Experimental results • Conclusion

  10. Approach Overview query Topic distribution analysis Topic-level importance analysis Document collection Topic-level random walk score Keyword-based language model Topic-related

  11. Approach Overview Topic-level random walk query Topic distribution analysis Topic Modeling Topic-level importance analysis Document collection Search with topics Topic-level random walk score Keyword-based language model Topic-related

  12. Topic Modeling • For each topic , draw respectively from Dirichlet prior ; • For each document : • Draw from Dirichlet prior ; • For each word in document : • draw a topic from multinomial distribution ; • draw a word from multinomial distribution ; • Automatically find topics in documents • LDA • Automatically assign topics for the query • Inference Word Topic

  13. Topic-level Random Walk • Transition probability • Ranking score

  14. Search • Score combined with language model • Proposed 1: TPR+ • Proposed 2: TPR* • Search with query modeling • Proposed 3: TPRq Text-based language model Topic-level random walk score Topic-related information

  15. Outline • Related work • Our approach • Topic modeling • Topic-level random walk • Search with topics • Experimental results • Conclusion

  16. Experimental Results • Data sets • Arnetminer (http://www.arnetminer.org) • 14,134 authors, 10,716 papers • 7 most frequently searched queries • Evaluation measures • P@5, P@10, P@20, R-pre, MAP

  17. Experimental Results • Baseline methods • Language model, BM25, LDA, PageRank • Several forms of combinations • LM+LDA, LM*LDA,LM*PR, LM+PR, LM*LDA*PR, BM25*PR • Parameter settings • α = 0.1, β = 0.1, λ = 0.15, |T|= 5, 15, 80 • γ = 0 to 1.0 (interval 0.1) • t = 0, 0.1, 0.2, 0.4, 0.6, 0.8, 1.0

  18. Experimental Results • Performance of retrieving papers/authors +2.10% +9.14%

  19. Experimental Results • Tuning parameter γ

  20. Experimental Results • Tuning parameter |T|

  21. Experimental Results • Tuning parameter t

  22. Intelligent agents Experimental Results Natural language processing • Example analysis • Topic distribution for different query words

  23. Experimental Results • Importance scores of documents by TPR+ and PR A: Verifiable Semantics for Agent Communication Language B: Probabilistic Parsing Using Left Conner Language Models C: The GRAIL Concept Modeling Language for Medical Terminology D: Agent-based Business Process Management

  24. Outline • Related work • Our approach • Topic modeling • Topic-level random walk • Search with topics • Experimental results • Conclusion

  25. Conclusion • Propose a 4-step framework for search through topic-level random walk. • Employ a probabilistic topic model to automatically extract topics from documents and further model queries • Perform random walk at topic level • Propose combination methods

  26. Conclusion • Experimental results show improvements (+9.14% and 2.10%). • Future work • Distributed calculation? • Semantic link?

  27. Thanks Q & A Demo : http://www.arnetminer.org

More Related