260 likes | 271 Views
Explore the use of computational models in analyzing discourse, including likelihood probabilities and speaker assignments. Understand the collaboration between Language Technologies and HCI students.
E N D
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
BTW: Excellent posts! As usual, you’re doing MUCH better than you seem to think!! Warm-Up • Can you describe the take aways from these graphs and tables? CD = likelihood of state X is conditioned on state Y. 2k(2k – 1) transition probabilities. CI = likelihood of person X’s behavior is conditioned on state Y. k(2k) transition probabilities. MI = likelihood of person X’s behavior just depends on that person’s behavior at the previous time point. 2k transition probabilities. Models with subscript k are person specific. Models with subscript any are general across people.
Summary of Results In Laskowski's model, how does the system who out of K participants is speaking next? How does it assign the speaker? From what I could tell it only knows someone else is most likely speaking.
Elijah’s continuation of SIDE discussion and discussion about Assignment 2….
What does it take for Language technologies students and HCI students to be able to collaborate? • Pace yourself. Keep your expectations realistic. • Find your place. You all have something very valuable to contribute from your own field and expertise. • Respect what others can contribute from their field. • Have the patience to listen to each other. Ask questions. Answer questions. And approach issues iteratively.
What usually happens…. * And I would add this is mostly what is happening in the fields we are drawing from….
What I would rather see… * My goal is just to start you on this path… the big question is where you’ll go from here.
Tips! • Look up words you don’t know in wikipedia • Look for definitions embedded in the text • Don’t get hung up on formalism – look for where the author gives a conceptual feel for the argument • If you’re not a computational linguist, then a good goal to shoot for is just to be able to join in a discussion about whether the knowledge bring brought to bear in a model is reasonable
Tips! • use your knowledge about the rhetorical structure of a research paper to zen out the message in a top-down way. • Keep it high level! Don’t get bogged down… • For example, start with the conclusion - that will tell you what the evaluation is trying to show. Then look at what is being compared : there are several different models. Now go back to the intro to see what the hypotheses or questions are -- they must map on to what is being compared. Now look at the graphs and tables -- try to match them to the structure of the evaluation and the conclusions that are drawn.
Interesting Student/Levison quote • Student: Entirely non-vocal cues like gaze and posture probably do help, when available - I'd be curious to compare the rate of turn-taking collisions between telephone and in-person conversations (especially with more than two participants). • Levinson: the same system seems to work equally well both in face-to-face interaction and in the absence of visual monitoring, as on the telephone…
What is a turn? • Syntactic units: sentences, clauses, noun phrases • Prosody: intonation tells us where we are in “the arc” of communicating an idea • Projectability: we need to be able to identify places where control over the floor could shift – doesn’t mean it will shift. • What do you think would happen if the data was not from meetings per se?
Tricky… • How were backchannels handled in the Laskowski models? I was just saying how it doesn't look like anyone has jumped on the use of "talk spurts" as opposed to something more linguistically relevant. I would argue that the use of a heuristic like this encourages Lekowski to not look at the data for linguistic elements if the data was not split correctly in the first place.
What else comes into play? • Devices for selecting a next speaker: • Questions directed at a person • Address terms • Gaze • Gesture
How would Levinson evaluate the models in today’s paper? (p299 of Levinson) • A good model should • Predict where we find overlaps? • Predict which overlaps seem rude? • Predict where we find pauses? • So what would he say? • Does the perplexity measure answer any of these questions?
Nice perspective!! • For question 2, there are a number of ways that we judge when we should talk. One way is using physical cues. In a multi party conversation, the speaker turning to you is a good indication that you should be the next person to speak. Similarly, another indicator could be when someone gestures towards you. • One thing missing from Laskowski is a notion that it’s someone’s turn.
Good question! • Beka's musings on chatiquette suggest both one-on-one and group chats as an interesting environment for further experiments - with only the written text and the one nonverbal hint that another speaker is preparing to take the floor (plus explicit naming of participants by @username), how would some of these turn-taking models fare? Are the turn-taking patterns of group-chat participants comparable to speakers in a meeting? How can interruptions be discusssed when the utterances are delivered in non-overlapping (but inter-weaving) chunks?
Another good point! • One thing I found odd about Laskowski's approach is that it's rare to have more than one speaker going at once (less than 5% of the time, according to Levinson). How is it that a representation of who's talking is useful, especially if precisely who doesn't matter? Won't every erstwhile non-speaker be equally likely to take their turn next? • How could you tweak his model to do this?
Student comment Exactly right to question this!! • For question 1, I wanted to first outline what it meant for something to be locally managed. In the Levinson reading, he describes a theory suggested by Sacks, Schegloff, and Jefferson, that turns are constructed of syntactic units which a speaker can employ until the end of a unit/transitional relevance place where speakers may change. Levinson later writes that a locally managed system is indifferent to the pool of potential next speakers, and is, instead, concerned with the transitions/relationships between speakers. Consequently, speaking as somewhat dependent on what other participants say, if only to fit in with where the last speaker left off and to resolve overlaps. This fits in with the idea of adjacency pairs that he discusses in the next section. • After reading the Laskowski paper, I'm not sure how what I've read fits in with the idea of conversations being locally managed, though I am leaning towards Laskowski rejecting models that suggest that dialogue can be analyzed with turns/speakers being analyzed separately, and therefore supporting the ideas put forth by Levinson.
Interesting Student Idea! • I guess what I'm trying to say is that one feature of the interpersonal dependence should be their relative rank (I understand that this is hard because a younger sister may be the boss of an older brother in the workplace, but they are ranked higher in some cultures at home). • Note: studies show men interrupt women more than the reverse…
Food for thought… • Of course, then the problem becomes, how do we define what these roles are? When do we assign someone to one role or another? When do we allow people to shift from one role to another? What happens when multiple people are acting in the same role at the same time? All of these things make that modeling enormously complex and probably reliant on far more data than is actually available, especially annotated. …
In your experience with Machine Learning, what has a bigger effect on performance: the representation of the data or the algorithm you use? Try to think of specific examples…
Nice Idea! • I understand the computational challenges of storing and computing on exponential factors but I think they could have picked the most likely, say 10 future instances of who was talking for the vocal interaction record and add the rest on, or hash the different possibilities. • How might you rework the model? What would be the states?
What do you conclude from this: • R-specific: specific to one set of speakers • K-specific: specific to one ordering of users • See other interesting comments in the conclusion – where you’ll find a lot of what is most relevant for this class.
Tips for next time • We will look at a paper about turn taking • When perplexity is high, the model is having a harder time predicting what is next • For turn taking perplexity, we have a state representation that specifies at one time point which participants are talking and which are not • The model takes the current state into account and measures how surprised it is at the next state • If the next state is surprising given the current state, the perplexity at that time point is high
Tips for next time • If you compare models based on turn taking perplexity, the one with lower perplexity probably has more of the information needed to account for transitions between states • Differences between models: • Whose behavior is contingent on whose behavior • Which data is used to build the model, and which data is used to test