320 likes | 472 Views
Computational Models of Discourse Analysis. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. Warm-Up Discussion. What is the distinction between personality, identity, and perspective? Does the distinction matter computationally
E N D
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
Warm-Up Discussion • What is the distinction between personality, identity, and perspective? • Does the distinction matter computationally • How do they related to one another as lenses for understanding social media data? • What do we take from today’s readings for assignment 4? Identity Personality Perspective
Student Comment • At first the paper did not seem related to our task of identifying gender but perhaps this paper shows that the way we see ourselves is extremely consistent. No matter how you ask the question a subject will always give you an honest answer as to how they see themselves. This could mean that no matter how hard we try we will sooner or later embed signals into our blog posts that indicate our perceived gender.
Student Comment • It seems that the importance of "spiritual self" in presentation is the most important takeaway from this paper. 96% of users attempt to describe themselves with aspects of their "spiritual self" (i.e., perceived abilities). So focusing on these instead of the material or the social might be better (although, it's possible that a particular gender uses one of these sub-types significantly more than another, which could also be handy, but we don't have that information). • Is this personality or identity? How would you expect it to relate to other online behavior?
Semester in Review • Unit 1: Theoretical Foundation • Unit 2: Linguistic Structure • Unit 3: Sentiment • Unit 4: Identity and Personality • Unit 5: Social Positioning • In each Unit: • Readings from Discourse Analysis and Sociolinguistics • Readings from Language Technologies • Hands-on assignment • Implementation and corpus based experiment • Competitive error analysis • Student Presentations
Building Tasks • According to Gee’s theory, whenever we speak or write, we are constructing 7 areas of reality • What we build: Significance, Practices, Identities, Relationships, Politics, Connections, Sign systems and knowledge • How we build them: Social languages, Socially situated identities, Discourses, Conversations, Figured worlds, intertextuality
What we Build • Significance: things and people made more or less significant through the text • Practices: ritualized activities and how are they being enacted through the text (for example, lecturing or mentoring) • Identities: manner in which things and people are being cast in a role through the text • Relationships: style of social relationship, like level of formality • Politics: how “social goods” are being distributed, who is responsible for the flow, where is it going • Connections: connections and disconnections between things and people, e.g., what ideas are related, how are things causally connected, what is affecting what? • Sign Systems and Knowledge: languages, social languages, and ways of knowing, what ways of communicating and knowing are treated as standard and acceptable in the context, e.g., that you’re expected to speak in English in class
Imagine an environmentalist commercial Form-Function Correspondence Range of meanings for the word “sustainability” Conversation Global Warming Discourse Environmentalism Discourse StatusQuo Socially Situated Identity Environmentalist Social Language Liberal rhetoric Figured World Expected structure of Conservationist Commercial Situated Meaning Meaning of “sustainability” in the commercial
Computationalizing Gee? • Challenge: not variationist • Form-function correspondences can be modeled naturally through rules • Cells of table like feature extractors? • Social Languages like topic models? • Figured worlds related to “social causality”
Computationalizing SFL? • See Elijah’s ACL paper! • We had to REALLY simplify to get there • Not clear how to do that for Heteroglossia yet
Computational Techniques • Text entailment/ similarity measures/ paraphrase/ constraint relaxation • Topic models • Machine Learning • Techniques: bootstrapping, HMMs, other statistical modeling techniques • Basic features: unigrams, bigrams, POS bigrams, acoustic and prosodic features (speech) • Created features: dictionaries, templates, syntactic dependency relations
Basic Aspects of Discourse Structure are Easiest to Model • Turn taking • Topic segments • Speech acts (at least direct ones) • More recent computational work focuses on more challenging “discoursey” problems like sentiment and stance • Some recent work on metaphors (related to frames), but not applied to discourse level problems
Problems • Labels in public datasets don’t necessarily match the theory • Computational approaches embody variationist assumptions, but much of the theory is grounded in a more contextualized view of meaning making • Lack of a fully satisfying operationalization of style (style is hard to separate from content) • Grammatical metaphor and other indirect strategies • Same effect can be achieved in so many ways – each technique only captures one slice – so you’re always just grasping a glimpse of what’s there • Overfitting spurious correlations • “subpopulations” leading to problems with generalization • Similar variation arising due to numerous different factors (gender, age, SES) • Features at too low level – words serving multiple purposes simultaneously
How would you expect an Engagement style analysis to relate to personality? • What effect would you expect to see on conversations? • Are these necessarily connected?
Freshman Engineering Study • 131 Freshman engineering students worked in groups of 3 or 4 to design a better wrench • Applying principles related to stress and leverage • Procedure • Tutorial on computer aided engineering • Pretest • Collaborative design activity • Posttest • Questionnaire
ConcertChat Server ConcertChatActor ConcertChatListener MessageFilter PresenceFilter DiscourseMemory AnnotationFilter OutputCoordinator SocialController ActivityDetector ProgressDetector PlanExecutor RequestDetector T.TakingCoordinator IntroductionsManager PromptingManager TutoringManager TutoringActor IntroductionsActor PromptingActor Tutor Agent Design Kumar, R. & Rosé, C. P. (2011). Architecture for building Conversational Agents that support Collaborative Learning, IEEE Transactions on Learning Technologies special issue on Intelligent and Innovative Support Systems for Computer Supported Collaborative Learning
Results on Breadth of Coverage of Design Space • Significant main effect of Heteroglossia on number of ideas mentioned • Heteroglossia was better than Monoglossia and Neutral • Significant interaction • In the Social condition, Monoglossia was worse than the other two
Results on Perception • Students were significantly happier with the interaction in the Heteroglossia condition than Neutral, with Monoglossia in the middle • Students liked the Heteroglossic and Monoglossic agents better than the Neutral agent • Students in the Heteroglossia condition felt marginally more successful than students in the Monoglossia condition • No effect on Personality indicators such as Pushy, Wishy Washy, etc. • Does that mean that impression of personality and how you feel about an interaction with someone are not linked?
Student Comment • I would also note that English is a very gender neutral language, so gender performativity is harder to classify.
Engagement • Already established: Positioning a proposition • But can it also be primarily positioning between people? • Patterns of positioning propositions as having the same or different alignment between speaker and hearer could do this • Is positioning in communication always positioning by means of propositional content?
Connection between Heteroglossia and Attitude But is this really different from a disclaim? And is this really different from a proclaim?
Hedging and Occupation? • And as such, I believe hedging is a much more effective tool in showing generational or occupational differences rather than gender differences. • For example, teenagers often use verbs such as 'like' and 'all' to report speech: he was all 'that's stupid' and then he was like ''but I'm stupid too'. The occupational differences I would attribute to the differences between people who need exact values as opposed to people who can accept generalizations or approximations.