Modeling Latent Biographic Attributes in Conversational Genres

Modeling Latent Biographic Attributes in Conversational Genres NikeshGarera David Yarowsky

Introduction • Latent classification of biographic attribute such as gender, age and native language. • Modeling and classifying biographic attributes based on lexical and discourse factors. • A speaker’s lexical choice and discourse style may differ substantially depending on the gender/age of the speaker’s interlocutor.

Corpus Detail • Fisher telephone conversation corpus • Standard Switchboard conversational corpus • Both are transcript and annotated.

Modeling Gender via Ngram Features • Using unigram and bigram features with tf-idf weighting. • Stop words were retained in the feature set. • Only Ngrams with frequency greater than 5 were retained. • SVM model. • 90.84% on Fisher corpora for gender classification • 90.22% on Switchboard corpora for gender classification

Effect of Partner’s Gender • Modeling of speaker gender/age based on the prior and joint modeling of the partner speaker’s gender/age • People tend to use stronger gender-specific, age-specific or dialect-specific word/phrase usage and discourse properties when speaking with someone of a similar gender/age/dialect than when speaking with someone of a different gender/age/dialect, when they may adapt a more neutral speaking style

Effect of Partner’s Gender

Effect of Partner’s Gender • Oracle Experiment • Assume we know whether the conversation is homogeneous (same gender) or heterogeneous (different gender). • Classify both the test conversation side and the partner side, and if the classifier is more confident about the partner side then we choose the gender of the test conversation side based on the heterogeneous/homogeneous information. • Test vs. Partner • If Confidence(T) > Confidence(P) • If Confidence(T) < Confidence(P) • If Hetero, Class(T) = !Class(P) • If Hemo, Class(T) = Class(P) • Overall accuracy improves to 96.46% on Fisher corpus from 90.84%

Effect of Partner’s Gender • Replacing Oracle by a Hemo vs. Hetero Classify • Classifying the conversation as mixed or single-gender • Low accuracy 68.35% on Fisher Corpus as male-male and female-female conversations are grouped into one class. • Create two different classifiers • Male-male vs. Rest • Female-female vs. Rest

Effect of Partner’s Gender • Modeling partner via conditional model and whole-conversation model • The following classifiers were trained and each of their scores was used as a feature in a meta SVM classifier: • Male-male vs. Rest • Female-female vs. Rest • Conditional model of gender given most likely partner's gender • Ngram model

Effect of Partner’s Gender • Conditional model of gender given most likely partner's gender • Two separate classifiers were trained for classifying the gender of a given conversation side, one where the partner is male and other where the partner is female. • Given a test conversation side, we first choose the most likely gender of the partner’s conversation side using the Ngram based model. • Choose the gender of the test conversation side using the appropriate conditional model.

Effect of Partner’s Gender • Incorporating Sociolinguistic Features • 1. % of conversation spoken: • 2. Speaker rate: • 3. %of pronoun usage: • 4. % of back-channel responses such as "(laughter)" and "(lipsmacks)". • 5. % of passive usage: • 6. % of short utterances • 7. % of modal auxiliaries, subordinate clauses. • 8. % of “mm” tokens such as "mhm", "um", "uh-huh", "uh", "hm", "hmm",etc. • 9. Type-token ratio • 10. Mean inter-utterance time: • 11. % of “yeah” occurences. • 12. % of WH-question words. • 13. % Mean word and utterance length.

Gender Classification Results

Reference NikeshGarera. Multilingual Acquisition of Structured Information via Novel Relationship Extraction Models over Diverse Knowledge Sources. Ph.D. Thesis, Johns Hopkins University, Baltimore, Maryland, September 2009.

Modeling Latent Biographic Attributes in Conversational Genres

Modeling Latent Biographic Attributes in Conversational Genres

Presentation Transcript

Genres in Literature

Modeling Acculturation Using Latent Class Analysis

Genres in Fiction

GENRES

Genres in Nonfiction

Grounding in Conversational Systems

Genres in Catcher

Genres

Genres

Life stories the biographic method

Problems of Modeling Phone Deletion in Conversational Speech for Speech Recognition

GENRES

Latent Growth Modeling

Bayesian dynamic modeling of latent trait distributions

New Directions in Latent Growth Modeling

Genres

Structural Equation Modeling (SEM) With Latent Variables

Grounding in Conversational Systems

Latent Variable Modeling of Cognitive Reserve

Latent Growth Modeling Using Mplus