Conditional Topic Random Fields

Conditional Topic Random Fields Jun Zhu and Eric P. Xing (@cs.cmu.edu) ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011

Overview • Introduction – nontrivial input features for text. • Conditional Random Fields • CdTM and CTRF • Model Inference • Experimental Results

Introduction • Topic models such as LDA are not “feature-based” in their inability to efficiently incorporate nontrivial features (contextual or summary features). • Further, they assume a bag-of-words construction, discarding order information that may be important. • The authors propose a model that addresses both feature and independence limitations using a conditional random field (CRF) than a fully generative model.

Conditional Random Fields • A conditional random field (CRF) is a way to label and segment structured data that removes independence assumptions imposed by HMMs. • The underlying idea of CRFs is that a sequence of random variables Y is globally conditioned on a sequence of observations X. Image source Hanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report.. Department of Computer and Information Science, University of Pennsylvania, 2004.

Conditional Topic Model • Assume a set of features denoting arbitrary local and global features. • The topic weight vector is defined as where fis a vector of feature functions defined on the features a and

Conditional Topic Model • The inclusion of Y is in following sLDA where the topic model regresses to a continuous or discrete response. • is the standard topic distributions over words. • This model does not impose word order dependence.

Feature Functions • Consider, for example, the set of word features “positive adjective”, “negative adjective”, “positive adjective with an inverting word”, “negative adjective with an inverting word”, so M=4. • If the word is “good” will yield a feature function vector while the word “not bad” will yield • The features are then concatenated depending on the topic assignment of the word . Suppose = h, then the feature f for “good” is a length MK vector: [ 1 0 0 0]’ [ 0 0 0 1]’ [ 0 0 0 0 | 0 0 0 0 |…| 1 0 0 0 |…| 0 0 0 0 | 0 0 0 0 ]’ k=2 k=h k=K-1 k=K k=1

Conditional Topic Random Fields • The generative process of CTRF for a single document is

Conditional Topic Random Fields • The term is a conditional topic random field over the topic assignments of all the words in one sentence and has the form • In the linear chain CTRF, the authors consider both singleton and pairwise feature functions • The cumulative feature function value on a sentence is • The pairwise feature function is assumed to be zero if Pairwise Singleton

Model Inference • Inference is performed in a similar variational fashion as in Correlated Topic Models (CRM). • The authors introduce a relaxation of the lower bound due to the introduction of the CRF, although for the univariate CdTM, the variational posterior can be computed exactly. • A close form solution is not available for , so an efficient gradient descent approach is used instead.

Empirical Results • The authors use hotel reviews built by crawling TripAdvisor. • The dataset consists of 5000 reviews with lengths between 1500 and 6000 words. The dataset also includes an integer (1-5) rating for each review. Each rating was represented by 1000 documents. • POS tags were employed to find adjectives. • Noun phrase chunking was used to associate words with good or bad connotations. The authors also extracted whether an inverting word is with 4 words of each adjective. • Lexicon size was 12000 when rare and stop words were removed.

Comparison of RatingPrediction Accuracy Equation Source: Blei, D. & McAuliffe, J. Supervised topic models. NIPS, 2007.

Topics

Ratings and Topics • Here, the authors show that supervised CTRF (sCTRF) shows good separation of rating scores among the topics (top row) compared to MedLDA (bottom row).

Feature Weights • Five features were considered: Default–equal to one for any word; Pos-JJ–positive adjective; Neg-JJ–negative adjective; Re-Pos-JJ–positive adjective that has a denying word before it; and Re-Neg-JJ–negative adjective that has a denying word before it. • The default feature dominates when truncated to 5 topics, but becomes less important at higher truncation levels.

Conditional Topic Random Fields