1 / 14

Presenter : Wei- Hao Huang

Discovering Interesting Usage Patterns in Text Collections: Integrating Text Mining with Visualization. Presenter : Wei- Hao Huang

ania
Download Presentation

Presenter : Wei- Hao Huang

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering Interesting Usage Patterns in TextCollections: Integrating Text Mining with Visualization Presenter : Wei-Hao Huang Authors : Anthony Don, Elena Zheleva, Machon Gregory, Sureyya Tarkan, Loretta Auvil, Tanya Clement, Ben Shneiderman, Catherine Plaisant CIKM 2007

  2. Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

  3. Motivation • Critical interpretation of literary works is difficult. • Researchers are rarely to support their interpretation and the development of new hypotheses. • Text mining algorithms typically return large number of patterns which are difficult to interpret out of context.

  4. Objectives • To propose text mining with Visualization results more interpretation to humanities scholars, journalists, intelligence analysts, and other researchers, in order to support the analysis of text collections.

  5. Methodology • FeatureLens • Frequent expressions • Frequent words • Frequent closed itemsets of n-grams

  6. FeatureLens

  7. Frequent expressions Ex: This is a book. 2-gram: {“This is”, “is a”, “a book”} 3-gram: {“This is a”, “is a book”} • To qualify a word or a longer expression • N-gram • Support of an expression

  8. Frequent words D2K/T2K provides the means to perform the frequent words analysis with stemming.

  9. Frequent closed itemsets of n-grams “improve our health care system” “improve our health our citizens” I = { “I will improve”, “will improve medical”, “will improve security”, “will improve education”, “improve medical aid”, “improve security in”, “improve education in” , “medical aid in”, “aid in our”, “security in our”, “education in our”, “in our country”} <par_id, X> X1 is a frequent closed itemsetbut X2 and X3 are not.

  10. Experiments • With two different types of text • The State of the Union Addresses • The Making of Americans

  11. The State of the Union Addresses How many times did “terrorist” appear in 2002? The president mentions “the American people” and “terrorist” in the same speeches, did the two terms ever appear in the same paragraph? 2. What was the longest pattern? In which year and paragraphs did it occur? What is the meaning of it?

  12. The Making of Americans

  13. Conclusions • These text mining concepts can help the user to analyze the text, and to create insights and new hypotheses. • FeatureLens helps to discover and present interesting insights about the text.

  14. Comments • Advantages • Text mining with visualization • Applications • Text mining

More Related