190 likes | 341 Views
Comparing Methods to Improve Information Extraction System using Subjectivity Analysis. Prepared by: Heena Waghwani Guided by: Dr. M. B. Chandak. Contents. Introduction Combining Two approaches Two Research Directions Subjectivity Analysis
E N D
Comparing Methods to Improve Information Extraction System using Subjectivity Analysis Prepared by: Heena Waghwani Guided by: Dr. M. B. Chandak
Contents • Introduction • Combining Two approaches • Two Research Directions • Subjectivity Analysis • Learning Subjective Words and Expressions with Extraction Pattern • Improving IE Systems with Subjectivity Classification • Strategies • Conclusion • Future Scope • References
Introduction • Subjectivity Analysis systems automatically identify and extract information relating to attitudes, opinions, and sentiments from unstructured text. • For example: • The world is a stage. • The sea of grief. • Information Extraction (IE) systems typically involve the automatic identification and extraction of factual information relating to events. • For example, IE have been built to extract facts associated with terrorist incidents, disease outbreaks, job and seminar announcements.
Combining the Two Approaches • The idea behind combining these two approaches is using a subjective sentence classifier to proactively identify and filter subjective sentences before extracting information from them, to minimize false hits. • For example, an IE system searching for bombings might incorrectly interpret the sentence “The Parliament exploded into a fury.” to be a physical explosion.
Two Research Directions • Use weakly supervised IE to automatically discover subjective words and expressions from unannotated text . • The use of subjectivity analysis to improve the accuracy of the fact based information extraction systems.
Subjectivity Analysis • Subjective expressions are words and phrases being used to express opinions, sentiments, speculations, etc • Two types: 1.Nouns having subjective meaning 2.Expressions that capture subjectivity • A sentence is subjective if it contains one or more subjective expressions of medium or high intensity.
Learning Subjective Words and Expressions with Extraction Pattern • Input: An existing Subjective Lexicon, a set of seed nouns, and a small amount of human review . • Extraction Pattern: Lexico-syntactic patterns that represent one or more words appearing in a specific syntactic context. • An extraction pattern is created by instantiating one of the syntactic templates with specific words. • Example:”She wanted desperately to believe in humanity.” • Will produce 4 patterns: • <subj> active-verb(wanted) • <subj>verb(wanted) infinitive(believe) • Infinitive(believe) <dobj> • Verb(wanted) infinitive(believe) <dobj> Syntactic Templates for extraction patterns
Learning Subjective Nouns Using Extraction Pattern Context • Two bootstrapping algorithms have been developed to create semantic dictionaries by exploiting extraction patterns: • Meta-Bootstrapping and • Basilisk • Both algorithms begin with unannotated texts and seed words that represent a semantic category(here subjective terms). • A bootstrapping process looks for words that appear in the same extraction pattern as the seed words and hypothesizes that those words belong to same semantic class. • Example: expressed<dobj> : hope, grief, views, worries
Learning Subjective Expressions as Extraction Patterns • To automatically learn extraction patterns that are associated with subjectivity, procedure similar to AutoSlog -TS is used. • For training AutoSlog-TS uses a text corpus consisting of two distinct sets of texts: “relevant” texts (subjective sentences) and “irrelevant” texts(objective sentences). • A set of syntactic templates represents the space of possible extraction patterns.
Steps in the learning process • Generate extraction patterns for every possible instantiation of the template that appears in the corpus. • All of the learned extraction patterns are applied to the training corpus and statistic for how often each pattren occurs in subjective versus objective sentences is gathered. • The extraction patterns are ranked using a conditional probability measure: Pr(subjective | pattern)= subjfreq(pattern)/freq(pattern)
Improving IE Systems with Subjectivity Classification • Information Extraction systems suffer from false hits, and many of these false hits occur in subjective sentences. • Many incorrect extractions can be prevented by identifying sentences that contain subjective language and disallowing extraction from them.
Strategies • Aggressive Subjective Sentence Filtering • Discards all extractions that occurred in sentences labeled as subjective by the classifier • Source Attribution Modification • When a source attribution occurs in a sentence having modest subjectivity score, it is not discarded. • For example, sentence in NEWS articles having source attributes like: “The President stated ….” , “The Associated Press Reported…..”
Strategies Continue… • Selective Subjective Sentence Filtering • Facts and opinions frequently do coexist in the same sentence. • Indicator patterns should always be allowed to extract information. • Example:”He was outraged by the terrorist attack on the World Trade Center” • If a pattern has a conditional probability P(relevant | pattern) >= 0.65 and a frequency >= 10, then it is labeled as indicator, because it is highly correlated with the domain relevant text. • Otherwise, the pattern is labeled as non-indicator pattern. • Extractions from indicator pattern are never discarded , but extractions from non-indicator pattern are discarded if they appear in a subjective sentence.
Strategies Continue… • Subjective Extraction Pattern Filtering • Anticipating which pattern will perform well is difficult. • Subjectivity analysis can provide an empirical , alternative assessment of each pattern, not just in terms of relevant domain, but in terms of whether it is more frequently used in subjective or objective context. • The probability that a sentence is subjective given that it contains that pattern is given by: p(subjectivity|pattern) • An extraction pattern is said to be subjective if p(subjectivity|pattern) >= 0.50 and its frequency >= 10 • Example: Thepattern “was aimed at <np>”
Conclusion • Subjectivity Analysis and Information Extraction are distinct but mutually benefitting areas. • Subjectivity Analysis can improve the performance of an Information Extraction system.
Future Scope • Different methods for Subjectivity Analysis can be used in different contexts for improving Information Extraction Systems, to give better results.
References • JanyceWiebe and Ellen Riloff “Finding Mutual Benefit between Subjectivity Analysis and Information Extraction.” • E. Riloff, “Automatically Generating Extraction Patterns from Untagged Text,” Proc. 13th Nat’l Conf. Artificial Intelligence, pp. 1044-1049, 1996. • P. Turney and M.L. Littman, “Measuring Praise and Ciriticism: Inference of Semantic Orientation from Association’” ACM Trans. Information Systems, vol. 21,no.4 , pp 315-346, 2003. • B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs Up? Sentiment Classification Using Machine Learning Techniques,” Proc. Conf. Empirical Methods in NLP, pp. 79-86, 2002.
K. Dave, S. Lawrence, and D.M. Pennock, “Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews,” Proc. 12th Int’l World Wide Web Conf., http://www2003.org, 2003. • T. Nasukawa and J. Yi, “Sentiment Analysis: Capturing Favorability Using Natural Language Processing,” Proc. Second Int’l Conf. Knowledge Capture, pp. 70-77, 2003. • S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushima, “Mining Product Reputations on the Web,” Proc. Eighth ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, pp. 341-349, 2002. • J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack, “Sentiment Analyzer: Extracting Sentiments about a Given Topic Using Natural Language Processing Techniques,” Proc. Third IEEE Int’l Conf. Data Mining, pp. 427-434, 2003.