A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics

A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics MICHAEL PAUL and ROXANA GIRJU University of Illinois at Urbana-Champaign

Probabilistic Topic Models • Each word token associated with hidden “topic” variable • Probabilistic approach to dimensionality reduction • Useful for uncovering latent structures in text • Basic formulation: • P(w|d) = P(w|topic) P(topic|d)

Probabilistic Topic Models • “Topics” are latent distributions over words • A topic can be interpreted of as a cluster of words • Topic models often cluster words by what people would consider topicality • There are often other dimensions in which words could be clustered • Sentiment/perspective/theme • What if we want to model both?

Previous Work • Topic-Sentiment Mixture Model (Mei et al., 2007) • Words come from either topic distribution or sentiment distribution • Topic+Perspective Model (Lin et al., 2008) • Words are weighted as topical vs. ideological

Previous Work • Cross-Collection LDA (Paul and Girju, 2009) • Each document belongs to a collection • Each topic has a word distribution shared among collections plus distributions unique to each collection • What if the “collection” was a hidden variable? • -> Topic-Aspect Model (TAM)

Topic-Aspect Model • Each document has • a multinomial topic mixture • a multinomial aspect mixture • Words may depend on both!

Topic-Aspect Model • Topic and aspect mixtures are drawn independently of one another • This differs from hierarchical topic models where one depends on the other • Can be thought of as two separate clustering dimensions

Topic-Aspect Model • Each word token also has 2 binary variables: • the “level” (background or topical) denotes if the word depends on the topic or not • the “route” (neutral or aspectual) denotes if the word depends on the aspect or not • A word may depend on a topic, an aspect, both, or neither

Topic-Aspect Model “Computational” Aspect • A word may depend on a topic, an aspect, both, or neither

Topic-Aspect Model “Linguistic” Aspect • A word may depend on a topic, an aspect, both, or neither

Topic-Aspect Model “Computational” Aspect • A word may depend on a topic, an aspect, both, or neither

Topic-Aspect Model • Distributions have Dirichlet/Beta priors • Latent Dirchlet Allocation framework • Number of aspects and topics are user-supplied parameters • Straightforward inference with Gibbs sampling

Topic-Aspect Model • Semi-supervised TAM when aspect label is known • Two options: • Fix P(y|d)=1 for the correct aspect label and 0 otherwise • Behaves like ccLDA (Paul and Girju, 2009) • Define a prior for P(y|d) to bias it toward the true label

Experiments • Three Datasets: • 4,247 abstracts from the ACL Anthology • 2,173 abstracts from linguistics journals • 594 articles from the Bitterlemons corpus (Lin et al., 2006) • a collection of editorials on the Israeli/Palestinian conflict

Experiments • Example: Computational Linguistics

Experiments • Example: Israeli/Palestinian Conflict • Unsupervised Prior for P(aspect|d) for true label

Evaluation • Cluster coherence • “word intrusion” method (Chang et al., 2009) • 5 human annotators • Compare against ccLDA and LDA • TAM clusters are as coherent as other established models

Evaluation • Document classification • Classify Bitterlemons perspectives (Israeli vs Palestinian) • Use TAM (2 aspects + 12 topics) output as input to SVM • Use aspect mixtures and topic mixtures as features • Compare against LDA

Evaluation • Document classification • 2 aspects from TAM much more strongly associated with true perspectives than 2 topics from LDA • Suggests that TAM is clustering along a different dimension than LDA by separating out another “topical” dimension (with 12 components)

Summary: Topic-Aspect Model • Can cluster along two independent dimensions • Words may be generated by both dimensions, thus clusters can be inter-related • Cluster definitions are arbitrary and their structure will depend on the data and the model parameterization (especially # of aspects/topics) • Modeling with 2 aspects and many topics is shown to produce aspect clusters corresponding to document perspectives on certain corpora

References • Chang, J.; Boyd-Graber, J.; Gerrish, S.; Wang, C.; and Blei, D. 2009. Reading tea leaves: How humans interpret topic models. In Neural Information Processing Systems. • Lin, W.; Wilson, T.; Wiebe, J.; and Hauptmann, A. 2006. Which side are you on? identifying perspectives at the document and sentence levels. In Proceedings of Tenth Conference on Natural Language Learning (CoNLL). • Lin, W.; Xing, E.; and Hauptmann, A. 2008. A joint topic and perspective model for ideological discourse. In ECML PKDD ’08: Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases -Part II, 17–32. Berlin, Heidelberg: Springer-Verlag. • Mei, Q.; Ling, X.; Wondra, M.; Su, H.; and Zhai, C. 2007. Topic sentiment mixture: modeling facets and opinions in weblogs. In WWW ’07: Proceedings of the 16th international conference on World Wide Web, 171–180. • Paul, M., and Girju, R. 2009. Cross-cultural analysis of blogs and forums with mixed-collection topic models. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 1408–1417.

A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics

A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics

Presentation Transcript

EU-Russia: A multi-faceted partnership

Sutter Bypass Two-Dimensional (2D) Hydraulic Model

Developing a Multi-Faceted Training Program

A Two-Dimensional Click Model for Query Auto-Completion

STUDY OF COMPRESSIBLE SHOCK FOR A TWO DIMENSIONAL MODEL OF A DELAVAL NOZZLE

Tiling Automata: a computational model for recognizable two-dimensional languages

Sustainability and agriculture – a multi-faceted relation

Multi-dimensional Indoor Location Information Model

Multi-Dimensional Arrays

Multi-Dimensional Arrays

A Multi-dimensional Approach

Multi-Faceted Unit Design

Topic 26 Two Dimensional Arrays

Dimensional model

Inclusive education is a multi-faceted concept .

Topics for Term Two

CSC 138 Topic 1 : Multi-dimensional Arrays

Multi-Dimensional Arrays

Two-Dimensional Chemistry Transport Model

Multi faceted Application - Profile Gages

Phil Lovin – A Multi-Faceted Professional

MULTI-DIMENSIONAL SECURITY