Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science

LightSIDE Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science Foundation and the Office of Naval Research

lightsidelabs.com/research/

Click here to load a file

Select Heteroglossia as the predicted category

Make sure the text field is selected to extract text features from

Punctuation can be a “stand in” for mood • “you think the answer is 9?” • “you think the answer is 9.” • Bigrams capture simple lexical patterns • “common denominator” versus “common multiple” • Trigrams (just like bigrams, but with 3 words next to each other) • Carnegie Mellon University • POS bigrams capture syntactic or stylistic information • “the answer which is …” vs “which is the answer” • Line length can be a proxy for explanation depth Feature Space Customizations

Contains non-stop word can be a predictor of whether a conversational contribution is contentful • “ok sure” versus “the common denominator” • Remove stop words removes some distracting features • Stemming allows some generalization • Multiple, multiply, multiplication • Removing rare features is a cheap form of feature selection • Features that only occur once or twice in the corpus won’t generalize, so they are a waste of time to include in the vector space Feature Space Customizations

Think like a computer! • Machine learning algorithms look for features that are good predictors, not features that are necessarily meaningful • Look for approximations • If you want to find questions, you don’t need to do a complete syntactic analysis • Look for question marks • Look for wh-terms that occur immediately before an auxilliary verb Feature Space Customizations

Click to extract text features

Select Logistic Regression as the Learner

Evaluate result by cross validation over sessions

Run the experiment

A sequence of 1 to 6 categories • May include GAPs • Can cover any symbol • GAP+ may cover any number of symbols • Must not begin or end with a GAP Stretchy Patterns(Gianfortoni, Adamson, & Rosé, 2011)

Now it’s your turn!We’ll explore some advanced features and error analysis after the break!

Identify large error cells • Make comparisons • Ask yourself how it is similar to the instances that were correctly classified with the same class (vertical comparison) • How it is different from those it was incorrectly not classified as (horizontal comparison) Error Analysis Process Positive Negative

Error Analysis on Development Set

Positive: is interesting, an interesting scene • Negative: would have been more interesting, potentially interesting, etc. What’s different?

* Note that in this case we get no benefit if we use feature selection over the original feature space.

Feature Splitting (DauméIII, 2007) General General Domain A Domain B Why is this nonlinear? It represents the interaction between each feature and the Domain variable Now that the feature space represents the nonlinearity, the algorithm to train the weights can be linear.

Healthcare Bill Dataset

Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science

Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science

Presentation Transcript

20-771: Computer Security Lecture 11: Windows 2000 III

Universal Access in the Information Society: Achievements, Challenges and Promises

Welcome to the Middle School Science Initiative’s Second Institute!

Application of Shifted Delta Cepstral Features for GMM Language Identification

Artistic Vision: Automatic Painting using Computer Vision Algorithms

Human-Computer Interaction

CS422 - Human Computer Interaction

Introduction to Computer Programming Fall 2011

Friday, January 27

Scalable Methods for Graph-Based Unsupervised and Semi-Supervised Learning

Introduction to Statistical Machine Translation

Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

DISTRIBUTED COMPUTING

Discourse theories and technologies

High School Computer Science Education: A five state study

COP 3530: Computer Science III Summer 2005 Graphs – Part 3

Computer Anatomy

CSCI 6363 Human Computer Interaction

Describing Resources on the Web: The Resource Description Framework

Computer Networks