360 likes | 384 Views
TagHelper Tools Supporting the Analysis of Conversational Data. Carolyn P. Ros é Language Technologies Institute and Human-Computer Interaction Institute Carnegie Mellon University. Outline. What is TagHelper tools? What can TagHelper Tools do for YOU?
E N D
TagHelper ToolsSupporting the Analysis of Conversational Data Carolyn P. Rosé Language Technologies Institute and Human-Computer Interaction Institute Carnegie Mellon University
Outline • What is TagHelper tools? • What can TagHelper Tools do for YOU? • How EASY is it to use TagHelper tools? • What are some TagHelper success stories? • What problems are we working on?
What is TagHelper tools? • A PSLC Enabling Technology project • Machine learning technology for processing conversational data • Chat data • Newsgroup style conversational data • Short answers and explanations • Goal: automate the categorization of spans of text
What is TagHelper tools? • An add-on to Microsoft Excel • Research Focus: identify and solve text classification problems specific to learning sciences • Types of categories, nature and size of data sets
Main Uses for TagHelper tools • Supporting data analysis involving conversational data • Triggering interventions • Supporting on-line assessment
Example: Triggering an Intervention • ST1: well what values do u have for the reheat cycle ? • ST2: for some reason I said temperature at turbine to be like 400 C • Tutor: Let's think about the motivation for Reheat. What process does the steam undergo in the Turbines ? • …
Example: Supporting on-line assessment * Using instructor assigned ratings as gold standard * Best performance without TagHelper tools: .16 correlation coefficient * Best performance with TagHelper tools: .63 correlation coefficient
Iterative Process for Using TagHelper tools • Obtain data in natural language form • Iterative process • Decide on a unit of analysis • Single contributions, topic segments, whole messages, etc. • Decide on a set of categories or a rating system • Set up data in Excel • Assign categories to part of your data • Use TagHelper to assign categories to the remaining portion of your data
Training and Testing • Start TagHelper tools by double clicking on the portal.bat icon • You will then see the following tool pallet • Train a prediction model on your coded data and then apply that model to uncoded data
Loading a File First click on Add a File Then select a file
Simplest Usage • Once your file is loaded, you have two options • The first option is to code your data using the default settings • To do this, simply click on “GO!” • The second option is to modify the default settings and then code • We will start with the first option • Note that the performance will not be optimal
Results Performance on coded data Results on uncoded data
Success Story 1: Supporting Data Analysis • Peer tutoring in Algebra LearnLab • Data coded for high-level-help, low-level-help, and no-help • Important predictor of learning (e.g., Webb et al., 2003) • TagHelper achieves agreement of .82 Kappa • Can be used for follow-up studies in same domain * Contributed by Erin Walker
Success Story 2: Triggering Interventions • Collaborative idea generation in the Earth Sciences domain • Chinese TagHelper learns hand-coded topic analysis • Human agreement .84 Kappa • TagHelper performance .7 Kappa • Trained models used in follow-up study to trigger interventions and facilitate data analysis
Example Dialogue * Feedback during idea generation increases both idea generation and learning (Wang et al., 2007)
Unique Ideas 12 Nom+N Nom+F Real+N 10 Real+F 8 #Unique Ideas 6 4 2 0 0 5 10 15 20 25 30 Time Stamp Process Analysis Process loss Pairs vs Individuals: F(1,24)=12.22, p<.005, 1 sigma Individuals+Feedback Individuals+NoFeedback Pairs+Feedback Pairs+NoFeedback Process loss Pairs vs Individuals: F(1,24)=4.61, p<.05, .61 sigma Negative effect of Feedback: F(1,24)= 7.23, p<.05, -1.03 sigma Positive effect of feedback: F(1,24)=16.43, p<.0005, 1.37 sigma
Interesting Problems • Highly skewed data sets • Very infrequent classes are often the most interesting and important • Careful feature space design helps more than powerful algorithms • Huge problem with non-independence of data points from same student • Off-the shelf machine learning algorithms not set up for this • New sampling techniques offer promise • “Medium” sized data sets • Contemporary machine learning approaches designed for huge data sets • Supplementing with alternative data sources may help
Example Lesson Learned Problem Context oriented coding Finding Careful feature space design goes farther than powerful algorithms
Sequential Learning • Notes sequential dependencies • Perhaps claims are stated before their warrants • Perhaps counter-arguments are given before new arguments • Perhaps people first build on their partner’s ideas and then offer a new idea
Thread Depth Best Parent Semantic Similarity Seg1 Seg2 Seg3 Seg1 Seg2 Seg3 Seg1 Seg2 Seg3 Thread Structure Features
Sequence Oriented Features • Notes whether text is within a certain proximity to quoted material
What did we learn? • Intuition confirmed • Different dimensions responded differently to context based enhancements • Feature based approach was more effective • Thread structure features were especially informative for Social Modes dimension • Thread structure information is more difficult to extract from chat data • Best results of similar approach on chat data only achieved a kappa of .45
Special Thanks To: William Cohen Pinar Donmez Jaime Arguello Gahgene Gweon Rohit Kumar Yue Cui Mahesh Joshi Yi-Chia Wang Hao-Chuan Wang Emil Albright Cammie Williams Questions?