Computational Models of Discourse Analysis

Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

Warm Up Discussion • Look at the Bitter Lemons entries in the handout. What stands out to you as evidence for the Israeli versus Palestinian perspective? • Do you think they’re just picking up on writing style? • What would it mean to pick up on perspective versus writing style? Examples from the paper:

Warm Up Discussion • Look at the Bitter Lemons entries in the handout. What stands out to you as evidence for the Israeli versus Palestinian perspective? • Do you think they’re just picking up on writing style? • What would it mean to pick up on perspective versus writing style? Student Comment: If we were indeed looking for patterns of language-usage, some sort of template-finding approach (as we mused over in the last sentiment paper) might be interesting - are there rhetorical techniques or sentence structures favored by one perspective over the other (or out of many)? Are there structures that indicate perspective-heavy sentences shared by both (or many) sides, or that couch/leverage the "opponents'" rhetoric within opposite-perspective texts?

Positioning the paper * Note: It’s true that it was unique at the time. Since then, there has been a lot of follow up work modeling perspectives using word distributions like this, which shows that the work was valued by the community.

Student Comment • It seems like the discourse level problem that the authors are trying to solve is that of bias. I differentiate between bias and perspectives because people can have the same bias but have different perspectives(a casual movie goer's positive review vs. a weather movie critic's positive review); people can also have different biases but the same perspective (a democrat who believes that taxes should be increased because the the money will go towards social programs that they support versus a republican who believes that taxes should be increased because constituents voted democrat and that's what you get when you vote democrat). • Bias is?: Positive versus negative or reasons for being for or against something • Perspective is?: Level of expertise or position on a bill

Student Comment • There's also a weird thing going on in the comments about saying "perspective" and really meaning "background, upbringing, culture, and innate biases of the writers." Personally, I'd be much happier about being able to model that kind of detail about a writer than I would about some one-off measure of "perspective."

Form-Function Correspondence Range of meanings for the word “sustainability” Imagine an environmentalist commercial Conversation Global Warming Discourse Environmentalism Discourse StatusQuo Socially Situated Identity Environmentalist Social Language Liberal rhetoric Where does perspective fit in this picture? Figured World Expected structure of Conservationist Commercial Situated Meaning Meaning of “sustainability” in the commercial

Where does perspective fit in this picture?

One take on Perspective in SFL Does this suggest an answer to this student question: “are there rhetorical techniques or sentence structures favored by one perspective over the other (or out of many)? Are there structures that indicate perspective-heavy sentences shared by both (or many) sides, or that couch/leverage the "opponents'" rhetoric within opposite-perspective texts?”

Perspective from Rhetoric • Implied author: Communication style is a projection of identity • Impression management, not necessarily the ground truth • Implied reader: What we assume about who is listening • Real assumptions, possibly incorrect • What we want recipients or overhearers to think are our assumptions • Reader: may or may not understand the text the way it was intended Author Implied Author Implied Reader Text Effect Reader

A good example of perspective…

3 Views on Perspective • Unit 3 Connection: perspective is kind of like sentiment • Unit 4 Connection: perspective is kind of like Personality and Identity • Presentation of self models • We’ll look at the Blog corpus • Unit 5 Connection: perspective is kind of like positioning • Get back to Appraisal: Engagement metafunction • We’ll come back to the Bitter Lemons Corpus Would you prefer to swap Units 4 and 5 so we do Bitter Lemons next?

Revisiting Tips for Monday’s Reading Assignment • Skip Section 4 and the Appendix the first time you read the paper • Then skim through section 4, skipping over any sentences you don’t understand • Focus on the initial paragraphs in sections/subsections, as these tend to give a high level idea of what the message is • Keep in mind that their Latent Sentence Perspective Model is just Naïve Bayes with one twist – can you find what that one twist is?

Statistical Model The document Perspective Strength of Bias In the original model, each word contributes to the likelihood depending on its own strength. In the revised model, polarized words within sentences that are on the whole less polarizing count less than the same words in sentences that are on the whole more polarizing. Or: Increase certainty by deemphasizing sentences that appear to be leaning towards the minority view. Will this work? Would work if the kinds of things that you mention but don’t take responsibility for are consistent within perspectives.

Student Comment • Their model did show a slight improvement over the other models at document classification, probably because treating sentence evidence as a latent variable is almost like smoothing, in a sense.

Evaluation Note that both the words themselves and their ranking will influence the model. * Bigger difference on experts (always the same two people, might be more consistent about what things they mention as ancillary details.).

Student Comment • Student comment: It really does not seem to me that "the small but positive improvement due to sentence-level modeling in LSPM us encouraging." Their specialized model is very slightly better than Naive Bayes.

Student Comment • ...and does anyone in this community ever run statistics to see if their accuracy is actually statistically significantly different from other models' accuracy?

Questions?

Computational Models of Discourse Analysis