Opinion Detection by Transfer Learning

Opinion Detection by Transfer Learning 11-742 Information Retrieval Lab Grace Hui Yang Advised by Prof. Yiming Yang

Outline • Introduction • The Problem • Transfer Learning by Constructing Informative Prior • Datasets • Evaluation Method • Experimental Results • Conclusion

Introduction • TREC 2006 Blog Track • Opinion Detection Task <num> Number: 851 <title> "March of the Penguins" <desc> Description: Provide opinion of the film documentary "March of the Penguins". <narr> Narrative: Relevant documents should include opinions concerning the film documentary "March of the Penguins". Articles or comments about penguins outside the context of this film documentary are not relevant.

Opinion Detection Literature Review • Researchers in Natural Language Processing (NLP) community • Turney (2002) : groups online words whose point mutual information close to "excellent" and "poor" • Riloff & Wiebe (2003): use a high-precision classifier to get high quality opinions and non-opinions, and then extract syntactic patterns. Repeat this process to bootstrap • Pang et al. (2002): treat opinion and sentiment detection and as a text classification problem • Naive Bayes, Maximum Entropy, SVM +unigram pres. (82.9%) • Pang & Lee (2005): use Minicuts to cluster sentences based on their subjectivity and sentiment orientation. • Researchers from data mining community • Morinaga et al. (2002) : use word polarity, syntactic pattern matching rules to extract opinions, PCA to create correspondence between the product names and keywords

Existing System • Query Expansion • Document Retrieval • Binary Text Classification by Bayesian Logistic Regression

No Available Training Data • Transfer Learning • Transfer knowledge over similar tasks but different domain • Generalize knowledge from limited training data • Discover underlying general structures across domains

Transfer Learning Literature Review • Baxter(1997) and Thrun(1996): both used hierarchical Bayesian learning • Lawrence and Platt (2004), Yu et al. (2005): also use hierarchical Bayesian models to learn hyper-parameters of Gaussian process • Ando and Zhang (2005): proposed a framework for Gaussian logistic regression for text classification . • Raina et al. (2006): continued this approach and built informative priors for Gaussian logistic regression

Transfer Learning • The Approach presented in this project is Inspired by the work done by Raina, Ng & Koller (2006) on text classification • Transferring common knowledge (word dependence) in similar tasks by constructing a informative prior in a Bayesian Logistic Regression Framework

Logistic Regression Framework • Logistic regression assumes sigmoid-like data distribution • To avoid overfitting, multivariate Gaussian prior is added on θ • Maximum a posteriori (MAP) Estimation

Non-diagonal Covariance • Zero-mean, equal variance Prior • Cannot capture relationship among words • Zero-mean, non-diagonal covariance Prior • Model word dependency in covariance matrix’s off-diagonal entries

Pair-wised Covariance • Covariance Definition: • Given zero mean,

Get Covariance by MCMC • Markov Chain Monte Carlo (MCMC) • Sample V (V=4) small vocabularies with size S (S=5) containing the two words wi and wjcorresponding to θi and θj. • From each vocabulary, sample T (T=4) training sets with size Z(Z=3) to train an ordinary Log. Reg. model on labeled datasets

Get Covariance by MCMC • Subtract a bootstrap estimation of the covariance due to randomness of training set change

Learning a Covariance Matrix • Learning a single covariance for pairs of regression coefficients is NOT all we need • Two Challenges: (1) Valid Covariance Matrix • A valid covariance matrix needs to be positive semi-definite (PSD) • Hermitian matrix (square, self-adjoint) with nonnegative eigen values. • Project the matrix on to a PSD cone

Learning a Covariance Matrix (2) Pair-wise calculations increase the complexity quadratically with vocabulary size • represent the word dependence as linear combination of underlying features • Learn the coefficients by Least Squared Error

Learning a Covariance Matrix By Joint Minimization • λ is the trade-off coefficient between the two objectives. • As λ-> 0, only care about PSD cone • As λ-> 1, only care about word pair relationship • Set to 0.6

Solve the Joint Minimization • Convex problem, converge to global minimum • Fix Σ , minimize over ψ • Use Quadratic Program (QP) Solver • Fix ψ , minimize over Σ • A special semi-definite programming (SDP) • Eigen decomposition and keep the nonnegative values

Feature Design • Model word dependency • Wordnet synset • and? • People do not always use the same general syntactic patterns to express opinion • "blah blah is good", • "awesome blah blah!"

Target-Opinion Word Pair • Different opinion targets relate to different customary expression • A person is knowledgeable • A computer processor is fast • A computer processor is knowledgeable (ill) • A person is fast (ill) • A computer processor is running like a horse (word polarity test fails)

Target-Opinion Word Pair • From training corpus, extract from a positive example • subject and object (excludes pronouns) • “Melvin, pig” • subject and BE-predicate • “lens, clear”, “base, heavy” • modifier and subject • “good, coffee” , “interesting, movie”

Word Synonym • Bridge vocabulary gap from training to testing • “This movie is good" in training corpus • "The film is really good" in the testing corpus

Feature Vector Log-co-occurrence Target-Opinion Synonym

Datasets • Training Corpus • Movie reviews [Pang & Lee from Cornell] • 10,000 sentences (5,000 opinions, 5,000 non-opinions) • Product reviews [Hu & Liu from UIC] • 4,000+ sentences (2,034 opinions, 2,173 non-opinions. • Digital camera, cell phone, DVD player, Jukebox, …

Datasets • Test Corpus – TREC 2006 Blog corpus • 3,201,002 articles (TREC reports 3,215,171) • December 2005 to February 2006 • Technorati, Bloglines, Blogpulse … • For each topic, 5,000 passages are retrieved • Using Lemur as search engine • 132,399 passages in total • 2,648 passages per topic • Each passage 1-10 sentences ( less than 100 words)

Evaluation Method • Precision at 11-pt recall level • Mean average precision (MAP) • Answers are provided by TREC qrels, • Document ids of documents containing an opinion • Note that our system is developed for opinion detection at sentence level • An averaged score of all the sentences in a retrieved passages • Extract Unique document ids to compare with TREC qrels

Experimental Results • Effects of Using Non-diagonal Prior Covariance • Baseline: Using movie reviews to train the Gaussian log. Reg. model with Prior ~N(0,σ2) • Feature Selection: Using common word features in movie reviews and product reviews to train the Gaussian log. Reg. model with Prior ~N(0,σ2) • Informative Prior:Using movie reviews to calculate prior covariance, train the Gaussian log. Reg. model with theinformative prior ~N(0,Σ)

32% improvement

Experimental Results • Effects of Feature Design • Baseline: Using movie reviews to train the Gaussian log. Reg. model with Prior ~N(0,σ2), bi-gram model • Transfer Learning Using Synonyms: Using informative prior ~N(0,Σ) • Transfer Learning Using Target-Opinion pairs: informative prior ~N(0,Σ) • Transfer Learning Using Both: informative prior ~N(0,Σ)

A good feature

Experimental Results • Effects on External Dataset Selection Negative Effect of Transfer Learning

Why Negative Effect Occurs? • Movie covers more general topics • Product only share 23% topics

Conclusion • Applying Transfer Learning in Opinion Detection • Transfer Learning by Informative Prior improves brutal transfer learning by 32% • Discovering a good feature for opinion detection • Target-Opinion pair • Need to be careful when choosing external datasets to help

Thank You!

Opinion Detection by Transfer Learning

Opinion Detection by Transfer Learning

Presentation Transcript

issues in transfer learning

SAS Deep Learning Object Detection, Keypoint Detection

Transfer of Learning

Transfer of Learning

Transfer to Learning

TRANSFER OF LEARNING

Learning theory Transfer of Learning

Transfer Learning

imaginary phone— learning by transfer

Opinion Spam Detection

Transfer of Learning

TRANSFER OF LEARNING

Towards Heterogeneous Transfer Learning

Transfer Learning Segment

Transfer Learning by Mapping and Revising Relational Knowledge

Knowledge Transfer and Opinion Detection in Blog Track

Dual Transfer Learning

Learning for Transfer

Transfer Learning by Mapping and Revising Relational Knowledge

Cash Transfer Learning

Transfer of Learning