1 / 20

Summarizing Denition fromWikipedia Articles

This study explores a method to summarize definitions from multiple Wikipedia articles, emphasizing the benefits of utilizing Wikipedia concept models for improved precision and recall. The experiment compares word models with Wikipedia concept models, as well as the impact of single versus multiple articles. Results suggest that the related article set contributes significantly to performance enhancement, especially in terms of nugget recall. The research aims to provide a more effective way to summarize definitions by leveraging Wikipedia concepts and related articles. Thank you for your interest.

leanad
Download Presentation

Summarizing Denition fromWikipedia Articles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summarizing Denition fromWikipedia Articles Zhicheng Zheng, Xiaoyan Zhu Tsinghua University

  2. Outline • Introduction • Related work • Method • Experiments • Conclusion

  3. Outline • Introduction • Related work • Method • Experiments • Conclusion

  4. Introduction • “Definition” from Wikipedia • A definition is a passage describing the meaning of a term (a word, phrase or other set of symbols), or a type of thing. • “Definition” treated in TREC-QA • “What is XXX” or“Who is XXX” • Or translated to “Tell something important or interesting about XXX”

  5. Outline • Introduction • Related work • Method • Experiments • Conclusion

  6. Related work • Pattern based methods [1,2] • Using patterns to generate potential definitional sentences • Advantages: accurate • Disadvantages: require manual labor or labeled data to generate patterns

  7. Related work • Relevance based methods • interesting nuggets often come in the form of trivia, novel or rare facts about the topic that tend to strongly co-occur with direct mention of topic keywords [3] • Using summarization techniques • Using Wikipedia articles [4] • Using single article

  8. Related work • Using multiple Wikipedia articles? • Benefits: improve summarization performance • users would better understand a topic if they read more related articles

  9. Outline • Introduction • Related work • Method • Experiments • Conclusion

  10. Method • Representation Model • Represent articles/sentences as vector of words/concepts • Word Model: TF/IDF • Not accurate enough • “Jordan Hill“ vs “Michael Jordan”, “Grant Hill” • Wikipedia Concept Model

  11. Wikipedia concept model • Wikipedia concept: Title of Wikipedia article • Represent a piece of text with the Wikipedia concepts in the text • Similar as TF/IDF way of word model

  12. Method • Wikipedia article expansion • Extract all related Wikipedia articles with main Wikipeida article • d and d’ are related  d and d’ link with each other

  13. Method • Summarizing definitions • Constraint: select sentences only from the original article (not from the other related articles) • Summarization method: • Maximal Marginal Relevance (MMR) [5] • Maximum Coverage (MC) [6]

  14. Outline • Introduction • Related work • Method • Experiments • Conclusion

  15. Experiment • Wikipedia articles: • Snapshots of English Wikipedia (2009) • Question sets: • Questions from TREC 13 – 15 • 215 definition questions • 190 could be found in Wikipedia • Evaluation metrics • Precision, Recall, F1, F3

  16. Experiment • Results • Compare: • Word model vs. Wikipeida concept model • Single article vs. Multiple articles

  17. Experiments • Analysis • Both the two algorithms benefit from the Wikipedia concept model • The related article set can help improving the performance in both the two algorithms. • The Wikipedia concept model contributes more to precision than to recall • The related article set leads to more improvement in terms of nugget recall than the Wikipedia concept model

  18. Conclusion • A method of summarizing definition from multiple Wikipedia articles • Experiments show than Wikipedia concepts benefits the extraction of definition • Also show that the related articles help weight concepts more effectively

  19. Thank you Any questions?

  20. References [1] Xu, J., Licuanan, A., Weischedel, R.: Trec 2003 qa at bbn: Answering definitional questions. In: TREC 2003. [2] Cui, H., Kan, M.Y., Chua, T.S.: Generic soft pattern models for definitional question answering. In: SIGIR 2005. [3] Kor, K.W., Chua, T.S.: Interesting nuggets and their impact on definitional question answering. In: SIGIR 2007 [4] Ye, S., Chua, T.S., Lu, J.: Summarizing definition fromwikipedia. In: ACL 2009 [5] Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: SIGIR 1998 [6] Gillick, D., Favre, B.: A scalable global model for summarization. In: ACL 2009

More Related