Comparing Methods to Improve Information Extraction System using Subjectivity Analysis

Comparing Methods to Improve Information Extraction System using Subjectivity Analysis Prepared by: Heena Waghwani Guided by: Dr. M. B. Chandak

Contents • Introduction • Combining Two approaches • Two Research Directions • Subjectivity Analysis • Learning Subjective Words and Expressions with Extraction Pattern • Improving IE Systems with Subjectivity Classification • Strategies • Conclusion • Future Scope • References

Introduction • Subjectivity Analysis systems automatically identify and extract information relating to attitudes, opinions, and sentiments from unstructured text. • For example: • The world is a stage. • The sea of grief. • Information Extraction (IE) systems typically involve the automatic identification and extraction of factual information relating to events. • For example, IE have been built to extract facts associated with terrorist incidents, disease outbreaks, job and seminar announcements.

Combining the Two Approaches • The idea behind combining these two approaches is using a subjective sentence classifier to proactively identify and filter subjective sentences before extracting information from them, to minimize false hits. • For example, an IE system searching for bombings might incorrectly interpret the sentence “The Parliament exploded into a fury.” to be a physical explosion.

Two Research Directions • Use weakly supervised IE to automatically discover subjective words and expressions from unannotated text . • The use of subjectivity analysis to improve the accuracy of the fact based information extraction systems.

Subjectivity Analysis • Subjective expressions are words and phrases being used to express opinions, sentiments, speculations, etc • Two types: 1.Nouns having subjective meaning 2.Expressions that capture subjectivity • A sentence is subjective if it contains one or more subjective expressions of medium or high intensity.

Learning Subjective Words and Expressions with Extraction Pattern • Input: An existing Subjective Lexicon, a set of seed nouns, and a small amount of human review . • Extraction Pattern: Lexico-syntactic patterns that represent one or more words appearing in a specific syntactic context. • An extraction pattern is created by instantiating one of the syntactic templates with specific words. • Example:”She wanted desperately to believe in humanity.” • Will produce 4 patterns: • <subj> active-verb(wanted) • <subj>verb(wanted) infinitive(believe) • Infinitive(believe) <dobj> • Verb(wanted) infinitive(believe) <dobj> Syntactic Templates for extraction patterns

Learning Subjective Nouns Using Extraction Pattern Context • Two bootstrapping algorithms have been developed to create semantic dictionaries by exploiting extraction patterns: • Meta-Bootstrapping and • Basilisk • Both algorithms begin with unannotated texts and seed words that represent a semantic category(here subjective terms). • A bootstrapping process looks for words that appear in the same extraction pattern as the seed words and hypothesizes that those words belong to same semantic class. • Example: expressed<dobj> : hope, grief, views, worries

Learning Subjective Expressions as Extraction Patterns • To automatically learn extraction patterns that are associated with subjectivity, procedure similar to AutoSlog -TS is used. • For training AutoSlog-TS uses a text corpus consisting of two distinct sets of texts: “relevant” texts (subjective sentences) and “irrelevant” texts(objective sentences). • A set of syntactic templates represents the space of possible extraction patterns.

Steps in the learning process • Generate extraction patterns for every possible instantiation of the template that appears in the corpus. • All of the learned extraction patterns are applied to the training corpus and statistic for how often each pattren occurs in subjective versus objective sentences is gathered. • The extraction patterns are ranked using a conditional probability measure: Pr(subjective | pattern)= subjfreq(pattern)/freq(pattern)

Improving IE Systems with Subjectivity Classification • Information Extraction systems suffer from false hits, and many of these false hits occur in subjective sentences. • Many incorrect extractions can be prevented by identifying sentences that contain subjective language and disallowing extraction from them.

Strategies • Aggressive Subjective Sentence Filtering • Discards all extractions that occurred in sentences labeled as subjective by the classifier • Source Attribution Modification • When a source attribution occurs in a sentence having modest subjectivity score, it is not discarded. • For example, sentence in NEWS articles having source attributes like: “The President stated ….” , “The Associated Press Reported…..”

Strategies Continue… • Selective Subjective Sentence Filtering • Facts and opinions frequently do coexist in the same sentence. • Indicator patterns should always be allowed to extract information. • Example:”He was outraged by the terrorist attack on the World Trade Center” • If a pattern has a conditional probability P(relevant | pattern) >= 0.65 and a frequency >= 10, then it is labeled as indicator, because it is highly correlated with the domain relevant text. • Otherwise, the pattern is labeled as non-indicator pattern. • Extractions from indicator pattern are never discarded , but extractions from non-indicator pattern are discarded if they appear in a subjective sentence.

Strategies Continue… • Subjective Extraction Pattern Filtering • Anticipating which pattern will perform well is difficult. • Subjectivity analysis can provide an empirical , alternative assessment of each pattern, not just in terms of relevant domain, but in terms of whether it is more frequently used in subjective or objective context. • The probability that a sentence is subjective given that it contains that pattern is given by: p(subjectivity|pattern) • An extraction pattern is said to be subjective if p(subjectivity|pattern) >= 0.50 and its frequency >= 10 • Example: Thepattern “was aimed at <np>”

Conclusion • Subjectivity Analysis and Information Extraction are distinct but mutually benefitting areas. • Subjectivity Analysis can improve the performance of an Information Extraction system.

Future Scope • Different methods for Subjectivity Analysis can be used in different contexts for improving Information Extraction Systems, to give better results.

References • JanyceWiebe and Ellen Riloff “Finding Mutual Benefit between Subjectivity Analysis and Information Extraction.” • E. Riloff, “Automatically Generating Extraction Patterns from Untagged Text,” Proc. 13th Nat’l Conf. Artificial Intelligence, pp. 1044-1049, 1996. • P. Turney and M.L. Littman, “Measuring Praise and Ciriticism: Inference of Semantic Orientation from Association’” ACM Trans. Information Systems, vol. 21,no.4 , pp 315-346, 2003. • B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs Up? Sentiment Classification Using Machine Learning Techniques,” Proc. Conf. Empirical Methods in NLP, pp. 79-86, 2002.

K. Dave, S. Lawrence, and D.M. Pennock, “Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews,” Proc. 12th Int’l World Wide Web Conf., http://www2003.org, 2003. • T. Nasukawa and J. Yi, “Sentiment Analysis: Capturing Favorability Using Natural Language Processing,” Proc. Second Int’l Conf. Knowledge Capture, pp. 70-77, 2003. • S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushima, “Mining Product Reputations on the Web,” Proc. Eighth ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, pp. 341-349, 2002. • J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack, “Sentiment Analyzer: Extracting Sentiments about a Given Topic Using Natural Language Processing Techniques,” Proc. Third IEEE Int’l Conf. Data Mining, pp. 427-434, 2003.

Thank You…

Comparing Methods to Improve Information Extraction System using Subjectivity Analysis

Comparing Methods to Improve Information Extraction System using Subjectivity Analysis

Presentation Transcript

Introduction to Information Extraction

Subjectivity and Sentiment Analysis

Using Genetic Information to Improve Health

Using Weather Information to Improve Route

Comparing and Combining Sentiment Analysis Methods

Sentiment Analysis and Subjectivity

Sentence-level Subjectivity Analysis

Exploiting Subjectivity Classification to Improve Information Extraction

Using Decision Support Information to Improve System Performance

Subjectivity and Sentiment Analysis

Using Information to Improve Distributor Performance

Comparing Information Extraction Pattern Models

Ex Information Extraction System

Shock Information Extraction system

Manual Subjectivity Analysis

Information extraction from web pages using extraction ontologies

Using Genetic Information to Improve Health

Exploiting Subjectivity Classification to Improve Information Extraction

Information extraction from web pages using extraction ontologies

Extraction Methods

Subjectivity and Sentiment Analysis