1 / 19

TIARA: A Visual Exploratory Text Analytic System

TIARA (Text Insight via Automated Responsive Analytics) is a visual exploratory text analytic system that combines text analytics and interactive visualization to help users explore and analyze large collections of text documents. It integrates unsupervised learning methods, topic analysis, topic ranking, keyword-based topic summarization, and time-sensitive keyword extraction techniques. The system supports effective exploratory text analysis and offers features like completeness and distinctiveness evaluation for keyword extraction. Future work includes adding sentence-based summaries, supporting other languages, and improving performance.

farrington
Download Presentation

TIARA: A Visual Exploratory Text Analytic System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TIARA: A Visual Exploratory Text Analytic System Presenter : Wei-Hao Huang Authors : FuruWei, ShixiaLiu, YangqiuSong, ShimeiPan Michelle X. Zhou, WeihongQian, Lei Shi, Li Tan Qiang Zhang SIGKDD 2010

  2. Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

  3. Motivation • The large collection of text to locate needed information or simply deciding is very costly and time-consuming. • Although a number of text analysis technologies are often abstract and complex, may not be consumable by users.

  4. Objectives • To present exploratory visual analytic system called TIARA (Text Insight via Automated Responsive Analytics). • To combine text analytics and interactive visualization to help users explore and analyze large collections of text. Documents TIARA System

  5. Methodology • TIARA • Topic Analysis • Topic Ranking • Keyword based Topic Summarization • Time-sensitive Keyword Extraction

  6. TIARA

  7. TIARA System architecture Database File system

  8. Topic Analysis To use unsupervised learning methods. is the number of Document is word of Document is vocabulary of size K is the number of topic is document-topic distribution matrix is topic-word distribution matrix Term frequencies in each cluster

  9. Topic Ranking Topic rank is measured by a combination of both topic content coverage and topic variance.

  10. Keyword based Topic Summarization

  11. Time-sensitive Keyword Extraction

  12. Time-sensitive Keyword Extraction

  13. Experiments • Time-sensitive keyword extraction procedure • Completeness • Distinctiveness • Response Time • Data set: • A personal email collection with 8326 email messages. • Emergency room data set containing 23,501 patient records.

  14. Completeness Defined as whether we can recover the original keywords of a topic by combining the keywords associated associated with each time segment.

  15. Distinctiveness Defined as whether we can distinguish one topic segment from another based on their associated keywords to avoid redundancy.

  16. Completeness and Distinctiveness Results

  17. Response Time

  18. Conclusions • TIARA tightly integrates text analytics with interactive visualization to support effective exploratory text analysis. • Future work • Add sentence-base summaries • Support other languages • Improve performance

  19. Comments • Advantages • To explore and analyze large text collections with interactive visualization • Applications • Text mining

More Related