140 likes | 300 Views
Discovering Interesting Usage Patterns in Text Collections: Integrating Text Mining with Visualization. Presenter : Wei- Hao Huang
E N D
Discovering Interesting Usage Patterns in TextCollections: Integrating Text Mining with Visualization Presenter : Wei-Hao Huang Authors : Anthony Don, Elena Zheleva, Machon Gregory, Sureyya Tarkan, Loretta Auvil, Tanya Clement, Ben Shneiderman, Catherine Plaisant CIKM 2007
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation • Critical interpretation of literary works is difficult. • Researchers are rarely to support their interpretation and the development of new hypotheses. • Text mining algorithms typically return large number of patterns which are difficult to interpret out of context.
Objectives • To propose text mining with Visualization results more interpretation to humanities scholars, journalists, intelligence analysts, and other researchers, in order to support the analysis of text collections.
Methodology • FeatureLens • Frequent expressions • Frequent words • Frequent closed itemsets of n-grams
Frequent expressions Ex: This is a book. 2-gram: {“This is”, “is a”, “a book”} 3-gram: {“This is a”, “is a book”} • To qualify a word or a longer expression • N-gram • Support of an expression
Frequent words D2K/T2K provides the means to perform the frequent words analysis with stemming.
Frequent closed itemsets of n-grams “improve our health care system” “improve our health our citizens” I = { “I will improve”, “will improve medical”, “will improve security”, “will improve education”, “improve medical aid”, “improve security in”, “improve education in” , “medical aid in”, “aid in our”, “security in our”, “education in our”, “in our country”} <par_id, X> X1 is a frequent closed itemsetbut X2 and X3 are not.
Experiments • With two different types of text • The State of the Union Addresses • The Making of Americans
The State of the Union Addresses How many times did “terrorist” appear in 2002? The president mentions “the American people” and “terrorist” in the same speeches, did the two terms ever appear in the same paragraph? 2. What was the longest pattern? In which year and paragraphs did it occur? What is the meaning of it?
Conclusions • These text mining concepts can help the user to analyze the text, and to create insights and new hypotheses. • FeatureLens helps to discover and present interesting insights about the text.
Comments • Advantages • Text mining with visualization • Applications • Text mining