400 likes | 411 Views
Explore the field of summarization and learn how to effectively manage personal information overload. Topics include text mining, text categorization, topic modeling, and sentiment analysis.
E N D
Summarization and Personal Information Management Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
What’s New about This Course? • Information Science and HCI • Understand what problem are we trying to solve • Linguistics • Rhetorical style and strategies • Integrating Multiple Language Technologies • Summarization • Text mining • Text categorization • Topic modeling • Sentiment analysis
Overwhelmed with data… Innovation to the rescue! www.powerfulinformation.org
Who we are and how to find us… Carolyn Penstein Rosé Course Instructor GHC 4515 cprose@cs.cmu.edu Mahesh Joshi Teaching Assistant GHC 5517 maheshj@cs.cmu.edu
What is this course about?
Information Overload … more harmful than smoking marijuana?
Personal connection… Now 86,703!!!
Industry Relevance
Summarization and my research
Zooming in and Out of Text • Using statistical text compression models • Exploring the use of syntactic dependency features to increase robustness at severe levels of compression • In-depth analysis of how humans compress text
Analysis of What Happened in a Conversation • Conversational Roles • Positioning in Negotiations • Exchanging social support • Socialization processes • Knowledge integration processes
Topic Time Processing conversational data
Supporting Project Course Instructors (Rosé et al., 2007; Gweon et al., In Press) • Interviews with 9 project course instructors • 3 Important types of Assessment Categories • Group processes most important
Course Objectives • Explore summarization from a needs-focused perspective • Broaden the definition of summarization • Let’s revolutionize the field! • Explore a variety of analytical and technical approaches • Learn from your fellow students in addition to learning from your instructor • Gain practical experience while doing a cool project!
Course Requirements • Reading Assignments + Postings • 1 in-class paper presentation • 3 (short!) Homework Assignments • Summary Design • Rhetorical analysis • SIDE exercise • Term Project (Poster and 4-page Report) • Final Exam (critique)
Term Project • Multi-document summarization of scientific literature • Summarizing Web searches • Text Compression and Summarization for handhelds • Summarizing Social Interactions Grand Challenges
SIDE: Summarization Integrated Development Environment Annotate Data Define Summaries Train automatic annotators Visualize Annotated Data SIDE facilitates rapid prototyping of summarization systems
SIDE • Download: www.cs.cmu.edu/~cprose/SIDE.html • Documentation: www.cs.cmu.edu/~emayfiel/SIDE-documentation.pdf • If you need help: elijah@cmu.edu • We’re adding support for topic modeling and text compression
Data set from Rajiv Gandhi University for Knowledge Technologies • Student population: The highest achieving kid in each rural village • All computer based instruction: Every kid is given a laptop • All English medium instruction • 2,000 students • Mostly grew up with Telugu medium instruction • 10 different search tasks
Investigating how low English literacy affects information seeking • Students were given a search task in English • We collected logs of their search behavior and the result of their search • Analysis • We examined the results of their search • We modeled their search strategy • We looked for connections between these two things Search Task: Imagine that you have uncle in Pittsburgh who recently went to a dentist and was diagnosed with an abscess in his tooth. He had to undergo a painful treatment for the infection. You have to search for the necessary information on the Internet, in order to prevent your uncle from having a recurrence of the abscess or any other tooth disease in general.
Example • Student didn’t understand “uncle” • Searches for uncle and finds page about “calling uncle” • Concludes that uncle means “child” • Find page about infant tooth problems • Thinks she has found the correct information Search Task: Imagine that you have uncle in Pittsburgh who recently went to a dentist and was diagnosed with an abscess in his tooth. He had to undergo a painful treatment for the infection. You have to search for the necessary information on the Internet, in order to prevent your uncle from having a recurrence of the abscess or any other tooth disease in general.
Problems we can address… • Evidence that students ignore portions of tasks that they don’t understand • Students frequently found information about abscesses but not prevention • Trouble with query formation • MY UNCLE HAS A TOOTH PROBLEM WT CAN I DO FOR HIM • Evidence that students don’t know how to “recover” from an unsuccessful attempt • Repeated queries • Queries with minimal modifications
Grading • Class participation (10%) • Homework Assignments (10% each) • Class paper presentation (10%) • Term project (40%) • Final exam (10%)
For next time! • On Drupal: Read one or the other and post to discussion board • Kim, K., Lustria, M., & Burke, D. (2007). Predictors of cancer information overload: findings from a national survey, Information Research, Vol 12, No 4. • Janssen, R. & de Poot, H. (2006). Information overload: Why some people seem to suffer more than others, Proceedings of NordiCHI.