1 / 26

Web Search Clustering and Labeling with Hidden Topics

Web Search Clustering and Labeling with Hidden Topics. Presenter : Chien-Hsing Chen Author: Cam- Tu Nguyen Xuan-Hieu Phan Susumu Horiguchi Thu- Trang Nguyen Quang-Thuy Ha. 2009.TALIP.40 . Outline. Motivation Objective Method Experiments Conclusion

willow
Download Presentation

Web Search Clustering and Labeling with Hidden Topics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Search Clustering and Labeling withHidden Topics Presenter:Chien-Hsing Chen Author: Cam-Tu Nguyen Xuan-HieuPhan Susumu Horiguchi Thu-Trang Nguyen Quang-Thuy Ha 2009.TALIP.40.

  2. Outline • Motivation • Objective • Method • Experiments • Conclusion • Comment

  3. Motivation • d1: • ezPeer+ 音樂下載、音樂試聽、歌詞、MP3、音樂網- 蔡依林- 歷年專輯 • ezPeer+ – 蔡依林- J1 Live Concert演唱會影音全紀錄,J-game,看我72變,城堡,J9 Party 派對精選,JolinJ- • Top 冠軍精選,舞孃,蔡依林唯舞獨尊演唱會鮮聽版& 混音專輯&花...web.ezpeer.com/singer/s120.html - 頁庫存檔 - 類似內容 • d2: • ezPeer+ 音樂下載、音樂試 • 花蝴蝶好聽… • web.ezpeer.com/singer/s120.html - 頁庫存檔 - 類似內容 • The snippets are usually noisier, less topic-focused, and much shorter • 花?? • similarity evaluation between snippets may not be successful d3: {He is an author} d4: {The writer is standing behind you}

  4. Objective • Similarity evaluator is referred to a set of hidden topics • di: {He is an author} • dj: {The writer is standing behind you} • (a document may be related to multi-topics)

  5. music movie Framework music movie radio player dj di di > topic10 dj > topic10 (label candidate generation)

  6. cul. hel. politics edu. LDA entertainment In training step: the keyword is related to a topic when it often occurs in the documents topic show business zm,n refer to topic k k topic m document n word z1 z2 z3 wm,n refer to vocabulary w1 w2 w3 k = 10 (show business) K=60 the word “music” in the topic 10 can explain the occurrence of the words in the documents m=1,2,3

  7. LDA k topic m document n word zm,n z1 wm,n k = topic 10 K=60 w1

  8. LDA dm k topic m document n word p(.|.)=? zm,n z1 wm,n k = topic 10 K=60 w1

  9. LDA p(.|.)=1/60 dm k topic m document n word p(.|.)=? zm,n z1 wm,n k = topic 10 K=60 w1

  10. Framework

  11. Similarity between di and dj • the tth term in the vocabulary V • the kth topic

  12. Framework similarity matrix between snippets

  13. Label Candidate Generation D Topic k

  14. Label assignment for clustering snippets D Topic k di dj

  15. music movie Framework music movie radio layer dj di di > topic10 dj > topic4, topic10 (label candidate generation)

  16. Experiment Wikipedia dataset Vnexpress dataset

  17. Experimental dataset Web dataset consists of 2,357 snippets in 9 categories 20 queries to Google and obtaining about 150 distinguished snippets

  18. Experiments • F-measure

  19. Experiments

  20. Experiments

  21. Experiments

  22. Experiments

  23. Experiments

  24. Experiments

  25. Conclusion • clustering snippets with hidden topics • labeling clusters using hidden topic analysis

  26. My Comment • Advantage • labeling clusters with the help of hidden topics • the size of snippets is small • Two datasets: 2,357 and 150 • (in our work: more than 2 million snippets) • Disadvantage • less depends on snippets • Application • snippets are useful to make sense

More Related