240 likes | 255 Views
Identifying Features in Opinion Mining via Intrinsic and Extrinsic Domain Relevance. Published in: Journal IEEE Transactions on Knowledge and Data Engineering archive Volume 26 Issue 3, March 2014. M1 sakusa. Introduction. Text Mining To discover the novel structures or knowledges
E N D
Identifying Features in Opinion Mining via Intrinsic and Extrinsic Domain Relevance Published in: Journal IEEE Transactions on Knowledge and Data Engineering archive Volume 26 Issue 3, March 2014 M1 sakusa
Introduction • Text Mining • To discover the novel structures or knowledges • from a large amount of text data such as unstructured data
Opinion Mining (≒ Sentiment Analysis) opinion feature (or feature)
Related Work • Opinion feature Extraction • Associate Rule Mining (ARM) approach • Latent Dirichlet allocation (LDA) approach
Related Work • Most Existing Feature Extraction on Opinion Mining Approach • Typically only use the knowledge or patterns mined from a given single review corpus. • Completely ignoring the possible variations present in a different corpus about Culture et al.
Suggestion • Intrinsic and Extrinsic Domain Relevance (IEDR) approach • To identify opinion features from online reviews by exploiting the difference in opinion feature statistics across two corpora. • Domain-specific(dependent) corpus (the given review corpus) • Domain-independent corpus (any other corpus) • Domain relevance (DR)
IEDR approach • Over View
IEDR approach • Candidate Feature Extraction • Dependency grammar • subject-verb (SBV) • verb-object (VOB) • preposition-object (POB)
IEDR approach • Candidate Feature Extraction • Dependency grammar • subject-verb (SBV) • verb-object (VOB) • preposition-object (POB) • NN : noun (noun phrases) • CF : candidate features
IEDR approach • Calculating Domain-Relevance • Dispersion • how significantly a term is mentioned across all documents by measuring the distributional significance of the term across different documents in the entire corpus. • Deviation • how frequently a term is mentioned in a particular document by measuring its distributional significance in the document. • Using TF-IDF term weights
IEDR approach Calculating Domain-Relevance N : a total number of N documents M : a total number of M terms i = 1, …, Mj = 1, …, N
IEDR approach Calculating Domain-Relevance (IDR, EDR)
IEDR approach Calculating Intrinsic-relevance & Extrinsic-relevance
IEDR approach Calculating Intrinsic-relevance & Extrinsic-relevance
Experiments Corpus Description Table 3 product.tech.163.com www.lvping.com/hotels Table 4 www.sogou.com/labs
Experiments Evaluated Methods
Experiments • Results • Precision vs Recall cellphone review hotel review
Experiments • Results • Size of Domain-independent Corpus 12 new corpora the size : 2, 4, 6, 8, 10, 20, 30, 40, 50, 60, 70, 80 (thousands)
Experiments • Results • Choice of Domain-Independent Corpus • Single Topic Domain-Independent Corpus cellphone review hotel review
Experiments • Results • Choice of Domain-Independent Corpus • Top-K Topics Domain-Independent Corpus Cellphone review top-1 : 8000(Culture) top-2 : 4000(Culture) + 4000(Sports) top-3 : 2667(Culture) + 2667(Sports) + 2666(Tourism) Hotel review top-1 : 8000(Culture) top-2 : 4000(Culture) + 4000(Education) top-3 : 2667(Culture) + 2667(Education) + 2666(Health)
Experiments • Results • Domain Relevance Thresholds
Experiments • Results • Feature-Based Opinion Mining Application
Conclusion • We proposed a novel intercorpus statistics approach to opinion feature extraction based on the IEDR feature-filtering criterion. • Experimental results demonstrate that the proposed IEDR not only leads to noticeable improvement over either IDR or EDR, but also outperforms four main- stream methods. • For future work, we will employ fine-grained topic modeling approach to jointly identify opinion features, including non-noun features, infrequent features, as well as implicit features. We plan to further test the IEDR opinion feature extraction in several • We have had some preliminary success in applying IEDR to extract English opinion features from hotel reviews.