1 / 40

Summarization and Personal Information Management

Explore the field of summarization and learn how to effectively manage personal information overload. Topics include text mining, text categorization, topic modeling, and sentiment analysis.

agustinw
Download Presentation

Summarization and Personal Information Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summarization and Personal Information Management Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

  2. What is summarization?

  3. What’s New about This Course? • Information Science and HCI • Understand what problem are we trying to solve • Linguistics • Rhetorical style and strategies • Integrating Multiple Language Technologies • Summarization • Text mining • Text categorization • Topic modeling • Sentiment analysis

  4. Overwhelmed with data… Innovation to the rescue! www.powerfulinformation.org

  5. A solved problem?

  6. Who we are and how to find us… Carolyn Penstein Rosé Course Instructor GHC 4515 cprose@cs.cmu.edu Mahesh Joshi Teaching Assistant GHC 5517 maheshj@cs.cmu.edu

  7. Mahesh winning an award for his research

  8. Mahesh’s office… But where is Mahesh?

  9. If you make an appointment, he will come!

  10. What is this course about?

  11. Information Overload … more harmful than smoking marijuana?

  12. Why do we care?

  13. Personal connection… Now 86,703!!!

  14. Industry Relevance

  15. Summarization and my research

  16. Zooming in and Out of Text • Using statistical text compression models • Exploring the use of syntactic dependency features to increase robustness at severe levels of compression • In-depth analysis of how humans compress text

  17. Analysis of What Happened in a Conversation • Conversational Roles • Positioning in Negotiations • Exchanging social support • Socialization processes • Knowledge integration processes

  18. Topic Time Processing conversational data

  19. Supporting Project Course Instructors (Rosé et al., 2007; Gweon et al., In Press) • Interviews with 9 project course instructors • 3 Important types of Assessment Categories • Group processes most important

  20. Supporting Project Course Instructors

  21. Course Specifics

  22. Course Objectives • Explore summarization from a needs-focused perspective • Broaden the definition of summarization • Let’s revolutionize the field! • Explore a variety of analytical and technical approaches • Learn from your fellow students in addition to learning from your instructor • Gain practical experience while doing a cool project!

  23. Course Requirements • Reading Assignments + Postings • 1 in-class paper presentation • 3 (short!) Homework Assignments • Summary Design • Rhetorical analysis • SIDE exercise • Term Project (Poster and 4-page Report) • Final Exam (critique)

  24. Term Project • Multi-document summarization of scientific literature • Summarizing Web searches • Text Compression and Summarization for handhelds • Summarizing Social Interactions Grand Challenges

  25. Resources

  26. SIDE: Summarization Integrated Development Environment Annotate Data Define Summaries Train automatic annotators Visualize Annotated Data SIDE facilitates rapid prototyping of summarization systems

  27. SIDE • Download: www.cs.cmu.edu/~cprose/SIDE.html • Documentation: www.cs.cmu.edu/~emayfiel/SIDE-documentation.pdf • If you need help: elijah@cmu.edu • We’re adding support for topic modeling and text compression

  28. Error Analysis

  29. Matt Kam – Offers Help with HandHeld Projects

  30. Data set from Rajiv Gandhi University for Knowledge Technologies • Student population: The highest achieving kid in each rural village • All computer based instruction: Every kid is given a laptop • All English medium instruction • 2,000 students • Mostly grew up with Telugu medium instruction • 10 different search tasks

  31. Investigating how low English literacy affects information seeking • Students were given a search task in English • We collected logs of their search behavior and the result of their search • Analysis • We examined the results of their search • We modeled their search strategy • We looked for connections between these two things Search Task: Imagine that you have uncle in Pittsburgh who recently went to a dentist and was diagnosed with an abscess in his tooth. He had to undergo a painful treatment for the infection. You have to search for the necessary information on the Internet, in order to prevent your uncle from having a recurrence of the abscess or any other tooth disease in general.

  32. Example • Student didn’t understand “uncle” • Searches for uncle and finds page about “calling uncle” • Concludes that uncle means “child” • Find page about infant tooth problems • Thinks she has found the correct information Search Task: Imagine that you have uncle in Pittsburgh who recently went to a dentist and was diagnosed with an abscess in his tooth. He had to undergo a painful treatment for the infection. You have to search for the necessary information on the Internet, in order to prevent your uncle from having a recurrence of the abscess or any other tooth disease in general.

  33. Problems we can address… • Evidence that students ignore portions of tasks that they don’t understand • Students frequently found information about abscesses but not prevention • Trouble with query formation • MY UNCLE HAS A TOOTH PROBLEM WT CAN I DO FOR HIM • Evidence that students don’t know how to “recover” from an unsuccessful attempt • Repeated queries • Queries with minimal modifications

  34. Grading • Class participation (10%) • Homework Assignments (10% each) • Class paper presentation (10%) • Term project (40%) • Final exam (10%)

  35. For next time! • On Drupal: Read one or the other and post to discussion board • Kim, K., Lustria, M., & Burke, D. (2007). Predictors of cancer information overload: findings from a national survey, Information Research, Vol 12, No 4. • Janssen, R. & de Poot, H. (2006). Information overload: Why some people seem to suffer more than others, Proceedings of NordiCHI.

  36. Questions?

More Related