Summarizing Denition fromWikipedia Articles

Summarizing Denition fromWikipedia Articles Zhicheng Zheng, Xiaoyan Zhu Tsinghua University

Outline • Introduction • Related work • Method • Experiments • Conclusion

Introduction • “Definition” from Wikipedia • A definition is a passage describing the meaning of a term (a word, phrase or other set of symbols), or a type of thing. • “Definition” treated in TREC-QA • “What is XXX” or“Who is XXX” • Or translated to “Tell something important or interesting about XXX”

Related work • Pattern based methods [1,2] • Using patterns to generate potential definitional sentences • Advantages: accurate • Disadvantages: require manual labor or labeled data to generate patterns

Related work • Relevance based methods • interesting nuggets often come in the form of trivia, novel or rare facts about the topic that tend to strongly co-occur with direct mention of topic keywords [3] • Using summarization techniques • Using Wikipedia articles [4] • Using single article

Related work • Using multiple Wikipedia articles? • Benefits: improve summarization performance • users would better understand a topic if they read more related articles

Method • Representation Model • Represent articles/sentences as vector of words/concepts • Word Model: TF/IDF • Not accurate enough • “Jordan Hill“ vs “Michael Jordan”, “Grant Hill” • Wikipedia Concept Model

Wikipedia concept model • Wikipedia concept: Title of Wikipedia article • Represent a piece of text with the Wikipedia concepts in the text • Similar as TF/IDF way of word model

Method • Wikipedia article expansion • Extract all related Wikipedia articles with main Wikipeida article • d and d’ are related  d and d’ link with each other

Method • Summarizing definitions • Constraint: select sentences only from the original article (not from the other related articles) • Summarization method: • Maximal Marginal Relevance (MMR) [5] • Maximum Coverage (MC) [6]

Experiment • Wikipedia articles: • Snapshots of English Wikipedia (2009) • Question sets: • Questions from TREC 13 – 15 • 215 definition questions • 190 could be found in Wikipedia • Evaluation metrics • Precision, Recall, F1, F3

Experiment • Results • Compare: • Word model vs. Wikipeida concept model • Single article vs. Multiple articles

Experiments • Analysis • Both the two algorithms benefit from the Wikipedia concept model • The related article set can help improving the performance in both the two algorithms. • The Wikipedia concept model contributes more to precision than to recall • The related article set leads to more improvement in terms of nugget recall than the Wikipedia concept model

Conclusion • A method of summarizing definition from multiple Wikipedia articles • Experiments show than Wikipedia concepts benefits the extraction of definition • Also show that the related articles help weight concepts more effectively

Thank you Any questions?

References [1] Xu, J., Licuanan, A., Weischedel, R.: Trec 2003 qa at bbn: Answering definitional questions. In: TREC 2003. [2] Cui, H., Kan, M.Y., Chua, T.S.: Generic soft pattern models for definitional question answering. In: SIGIR 2005. [3] Kor, K.W., Chua, T.S.: Interesting nuggets and their impact on definitional question answering. In: SIGIR 2007 [4] Ye, S., Chua, T.S., Lu, J.: Summarizing definition fromwikipedia. In: ACL 2009 [5] Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: SIGIR 1998 [6] Gillick, D., Favre, B.: A scalable global model for summarization. In: ACL 2009

Summarizing Denition fromWikipedia Articles

Summarizing Denition fromWikipedia Articles

Presentation Transcript

Summarizing

Summarizing

Summarizing Journal Articles

Summarizing

Summarizing

Video 5 : Summarizing Journal Articles

Summarizing

Summarizing

Summarizing

Summarizing

Summarizing

Summarizing

Summarizing

Summarizing

Summarizing

Summarizing

Summarizing

Summarizing

Summarizing

Summarizing

Summarizing