360 likes | 389 Views
Unlock the power of TagHelper tools for automating text categorization, aiding online assessment, and triggering interventions in the analysis of conversational data. Discover success stories and learn how to use this tool effortlessly.
E N D
TagHelper ToolsSupporting the Analysis of Conversational Data Carolyn P. Rosé Language Technologies Institute and Human-Computer Interaction Institute Carnegie Mellon University
Outline • What is TagHelper tools? • What can TagHelper Tools do for YOU? • How EASY is it to use TagHelper tools? • What are some TagHelper success stories? • What problems are we working on?
What is TagHelper tools? • A PSLC Enabling Technology project • Machine learning technology for processing conversational data • Chat data • Newsgroup style conversational data • Short answers and explanations • Goal: automate the categorization of spans of text
What is TagHelper tools? • An add-on to Microsoft Excel • Research Focus: identify and solve text classification problems specific to learning sciences • Types of categories, nature and size of data sets
Main Uses for TagHelper tools • Supporting data analysis involving conversational data • Triggering interventions • Supporting on-line assessment
Example: Triggering an Intervention • ST1: well what values do u have for the reheat cycle ? • ST2: for some reason I said temperature at turbine to be like 400 C • Tutor: Let's think about the motivation for Reheat. What process does the steam undergo in the Turbines ? • …
Example: Supporting on-line assessment * Using instructor assigned ratings as gold standard * Best performance without TagHelper tools: .16 correlation coefficient * Best performance with TagHelper tools: .63 correlation coefficient
Iterative Process for Using TagHelper tools • Obtain data in natural language form • Iterative process • Decide on a unit of analysis • Single contributions, topic segments, whole messages, etc. • Decide on a set of categories or a rating system • Set up data in Excel • Assign categories to part of your data • Use TagHelper to assign categories to the remaining portion of your data
Training and Testing • Start TagHelper tools by double clicking on the portal.bat icon • You will then see the following tool pallet • Train a prediction model on your coded data and then apply that model to uncoded data
Loading a File First click on Add a File Then select a file
Simplest Usage • Once your file is loaded, you have two options • The first option is to code your data using the default settings • To do this, simply click on “GO!” • The second option is to modify the default settings and then code • We will start with the first option • Note that the performance will not be optimal
Results Performance on coded data Results on uncoded data
Success Story 1: Supporting Data Analysis • Peer tutoring in Algebra LearnLab • Data coded for high-level-help, low-level-help, and no-help • Important predictor of learning (e.g., Webb et al., 2003) • TagHelper achieves agreement of .82 Kappa • Can be used for follow-up studies in same domain * Contributed by Erin Walker
Success Story 2: Triggering Interventions • Collaborative idea generation in the Earth Sciences domain • Chinese TagHelper learns hand-coded topic analysis • Human agreement .84 Kappa • TagHelper performance .7 Kappa • Trained models used in follow-up study to trigger interventions and facilitate data analysis
Example Dialogue * Feedback during idea generation increases both idea generation and learning (Wang et al., 2007)
Unique Ideas 12 Nom+N Nom+F Real+N 10 Real+F 8 #Unique Ideas 6 4 2 0 0 5 10 15 20 25 30 Time Stamp Process Analysis Process loss Pairs vs Individuals: F(1,24)=12.22, p<.005, 1 sigma Individuals+Feedback Individuals+NoFeedback Pairs+Feedback Pairs+NoFeedback Process loss Pairs vs Individuals: F(1,24)=4.61, p<.05, .61 sigma Negative effect of Feedback: F(1,24)= 7.23, p<.05, -1.03 sigma Positive effect of feedback: F(1,24)=16.43, p<.0005, 1.37 sigma
Interesting Problems • Highly skewed data sets • Very infrequent classes are often the most interesting and important • Careful feature space design helps more than powerful algorithms • Huge problem with non-independence of data points from same student • Off-the shelf machine learning algorithms not set up for this • New sampling techniques offer promise • “Medium” sized data sets • Contemporary machine learning approaches designed for huge data sets • Supplementing with alternative data sources may help
Example Lesson Learned Problem Context oriented coding Finding Careful feature space design goes farther than powerful algorithms
Sequential Learning • Notes sequential dependencies • Perhaps claims are stated before their warrants • Perhaps counter-arguments are given before new arguments • Perhaps people first build on their partner’s ideas and then offer a new idea
Thread Depth Best Parent Semantic Similarity Seg1 Seg2 Seg3 Seg1 Seg2 Seg3 Seg1 Seg2 Seg3 Thread Structure Features
Sequence Oriented Features • Notes whether text is within a certain proximity to quoted material
What did we learn? • Intuition confirmed • Different dimensions responded differently to context based enhancements • Feature based approach was more effective • Thread structure features were especially informative for Social Modes dimension • Thread structure information is more difficult to extract from chat data • Best results of similar approach on chat data only achieved a kappa of .45
Special Thanks To: William Cohen Pinar Donmez Jaime Arguello Gahgene Gweon Rohit Kumar Yue Cui Mahesh Joshi Yi-Chia Wang Hao-Chuan Wang Emil Albright Cammie Williams Questions?