1 / 24

Identifying Features in Opinion Mining via Intrinsic and Extrinsic Domain Relevance

Identifying Features in Opinion Mining via Intrinsic and Extrinsic Domain Relevance. Published in: Journal IEEE Transactions on Knowledge and Data Engineering archive Volume 26 Issue 3, March 2014. M1 sakusa. Introduction. Text Mining To discover the novel structures or knowledges

hcrum
Download Presentation

Identifying Features in Opinion Mining via Intrinsic and Extrinsic Domain Relevance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying Features in Opinion Mining via Intrinsic and Extrinsic Domain Relevance Published in: Journal IEEE Transactions on Knowledge and Data Engineering archive Volume 26 Issue 3, March 2014 M1 sakusa

  2. Introduction • Text Mining • To discover the novel structures or knowledges • from a large amount of text data such as unstructured data

  3. Opinion Mining (≒ Sentiment Analysis)

  4. Opinion Mining (≒ Sentiment Analysis) opinion feature (or feature)

  5. Related Work • Opinion feature Extraction • Associate Rule Mining (ARM) approach • Latent Dirichlet allocation (LDA) approach

  6. Related Work • Most Existing Feature Extraction on Opinion Mining Approach • Typically only use the knowledge or patterns mined from a given single review corpus. • Completely ignoring the possible variations present in a different corpus about Culture et al.

  7. Suggestion • Intrinsic and Extrinsic Domain Relevance (IEDR) approach • To identify opinion features from online reviews by exploiting the difference in opinion feature statistics across two corpora. • Domain-specific(dependent) corpus (the given review corpus) • Domain-independent corpus (any other corpus) • Domain relevance (DR)

  8. IEDR approach • Over View

  9. IEDR approach • Candidate Feature Extraction • Dependency grammar • subject-verb (SBV) • verb-object (VOB) • preposition-object (POB)

  10. IEDR approach • Candidate Feature Extraction • Dependency grammar • subject-verb (SBV) • verb-object (VOB) • preposition-object (POB) • NN : noun (noun phrases) • CF : candidate features

  11. IEDR approach • Calculating Domain-Relevance • Dispersion • how significantly a term is mentioned across all documents by measuring the distributional significance of the term across different documents in the entire corpus. • Deviation • how frequently a term is mentioned in a particular document by measuring its distributional significance in the document. • Using TF-IDF term weights

  12. IEDR approach Calculating Domain-Relevance N : a total number of N documents M : a total number of M terms i = 1, …, Mj = 1, …, N

  13. IEDR approach Calculating Domain-Relevance (IDR, EDR)

  14. IEDR approach Calculating Intrinsic-relevance & Extrinsic-relevance

  15. IEDR approach Calculating Intrinsic-relevance & Extrinsic-relevance

  16. Experiments Corpus Description Table 3 product.tech.163.com www.lvping.com/hotels Table 4 www.sogou.com/labs

  17. Experiments Evaluated Methods

  18. Experiments • Results • Precision vs Recall cellphone review hotel review

  19. Experiments • Results • Size of Domain-independent Corpus 12 new corpora the size : 2, 4, 6, 8, 10, 20, 30, 40, 50, 60, 70, 80 (thousands)

  20. Experiments • Results • Choice of Domain-Independent Corpus • Single Topic Domain-Independent Corpus cellphone review hotel review

  21. Experiments • Results • Choice of Domain-Independent Corpus • Top-K Topics Domain-Independent Corpus Cellphone review top-1 : 8000(Culture) top-2 : 4000(Culture) + 4000(Sports) top-3 : 2667(Culture) + 2667(Sports) + 2666(Tourism) Hotel review top-1 : 8000(Culture) top-2 : 4000(Culture) + 4000(Education) top-3 : 2667(Culture) + 2667(Education) + 2666(Health)

  22. Experiments • Results • Domain Relevance Thresholds

  23. Experiments • Results • Feature-Based Opinion Mining Application

  24. Conclusion • We proposed a novel intercorpus statistics approach to opinion feature extraction based on the IEDR feature-filtering criterion. • Experimental results demonstrate that the proposed IEDR not only leads to noticeable improvement over either IDR or EDR, but also outperforms four main- stream methods. • For future work, we will employ fine-grained topic modeling approach to jointly identify opinion features, including non-noun features, infrequent features, as well as implicit features. We plan to further test the IEDR opinion feature extraction in several • We have had some preliminary success in applying IEDR to extract English opinion features from hotel reviews.

More Related