200 likes | 214 Views
This study explores a method to summarize definitions from multiple Wikipedia articles, emphasizing the benefits of utilizing Wikipedia concept models for improved precision and recall. The experiment compares word models with Wikipedia concept models, as well as the impact of single versus multiple articles. Results suggest that the related article set contributes significantly to performance enhancement, especially in terms of nugget recall. The research aims to provide a more effective way to summarize definitions by leveraging Wikipedia concepts and related articles. Thank you for your interest.
E N D
Summarizing Denition fromWikipedia Articles Zhicheng Zheng, Xiaoyan Zhu Tsinghua University
Outline • Introduction • Related work • Method • Experiments • Conclusion
Outline • Introduction • Related work • Method • Experiments • Conclusion
Introduction • “Definition” from Wikipedia • A definition is a passage describing the meaning of a term (a word, phrase or other set of symbols), or a type of thing. • “Definition” treated in TREC-QA • “What is XXX” or“Who is XXX” • Or translated to “Tell something important or interesting about XXX”
Outline • Introduction • Related work • Method • Experiments • Conclusion
Related work • Pattern based methods [1,2] • Using patterns to generate potential definitional sentences • Advantages: accurate • Disadvantages: require manual labor or labeled data to generate patterns
Related work • Relevance based methods • interesting nuggets often come in the form of trivia, novel or rare facts about the topic that tend to strongly co-occur with direct mention of topic keywords [3] • Using summarization techniques • Using Wikipedia articles [4] • Using single article
Related work • Using multiple Wikipedia articles? • Benefits: improve summarization performance • users would better understand a topic if they read more related articles
Outline • Introduction • Related work • Method • Experiments • Conclusion
Method • Representation Model • Represent articles/sentences as vector of words/concepts • Word Model: TF/IDF • Not accurate enough • “Jordan Hill“ vs “Michael Jordan”, “Grant Hill” • Wikipedia Concept Model
Wikipedia concept model • Wikipedia concept: Title of Wikipedia article • Represent a piece of text with the Wikipedia concepts in the text • Similar as TF/IDF way of word model
Method • Wikipedia article expansion • Extract all related Wikipedia articles with main Wikipeida article • d and d’ are related d and d’ link with each other
Method • Summarizing definitions • Constraint: select sentences only from the original article (not from the other related articles) • Summarization method: • Maximal Marginal Relevance (MMR) [5] • Maximum Coverage (MC) [6]
Outline • Introduction • Related work • Method • Experiments • Conclusion
Experiment • Wikipedia articles: • Snapshots of English Wikipedia (2009) • Question sets: • Questions from TREC 13 – 15 • 215 definition questions • 190 could be found in Wikipedia • Evaluation metrics • Precision, Recall, F1, F3
Experiment • Results • Compare: • Word model vs. Wikipeida concept model • Single article vs. Multiple articles
Experiments • Analysis • Both the two algorithms benefit from the Wikipedia concept model • The related article set can help improving the performance in both the two algorithms. • The Wikipedia concept model contributes more to precision than to recall • The related article set leads to more improvement in terms of nugget recall than the Wikipedia concept model
Conclusion • A method of summarizing definition from multiple Wikipedia articles • Experiments show than Wikipedia concepts benefits the extraction of definition • Also show that the related articles help weight concepts more effectively
Thank you Any questions?
References [1] Xu, J., Licuanan, A., Weischedel, R.: Trec 2003 qa at bbn: Answering definitional questions. In: TREC 2003. [2] Cui, H., Kan, M.Y., Chua, T.S.: Generic soft pattern models for definitional question answering. In: SIGIR 2005. [3] Kor, K.W., Chua, T.S.: Interesting nuggets and their impact on definitional question answering. In: SIGIR 2007 [4] Ye, S., Chua, T.S., Lu, J.: Summarizing definition fromwikipedia. In: ACL 2009 [5] Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: SIGIR 1998 [6] Gillick, D., Favre, B.: A scalable global model for summarization. In: ACL 2009