LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules

LEDIR: An Unsupervised Algorithm for Learning Directionality of Inference Rules From EMNLP & CoNLL 2007 (Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning) Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.12.11

Outlines • Introduction • Related Work • Learning Directionality of Inference Rules • Experimental Setup • Experimental Results • Conclusion

Introduction (1) • Inference: X eats Y ⇔ X likes Y • Examples: • “I eat spicy food. ⇒ I like spicy food. (YES) • I like rollerblading(直排輪溜冰). ⇒ I eat rollerblading. (NO) • Preference: X eats Y ⇒ X likes Y (Asymmetric) • Plausibility: • 2 sets:{1, 2, 3} {4} • Directionality: • 3 sets: {1} {2} {3}

Introduction (2) • Applications (for improving the performance of) • QA (Harabagiu and Hickl, 2006) • Multi-Document Summarization (Barzilay et al. 1999) • IR (Anick and Tipirneni 1999) • Proposed algorithm • LEDIR (LEarning Directionality of Inference Rules, pronounced “Leader”) • Filtering incorrect rules (case 4) • Identifying the directionality of the correct ones (case 1, 2, or 3)

Related Work • Learning inference rules • Barzilay and McKeown (2001) for paraphrases, DIRT (Lin and Pantel 2001) and TEASE (Szpektor et al. 2004) for inference rules • Low precision and bidirectional rules only • Learning directionality • Chklovski and Pantel (2004) • Zanzotto et al. (2006) • Torisawa (2006) • Geffet and Dagan (2005)

Learning Directionality of Inference Rules(1) – Formal Definition • <x, p, y> • p is a binary semantic relation. • The semantic relation can be verb or other relation. • x, y are entities. • Plausibility: • 2 sets:{1, 2, 3} {4} • Directionality: • 3 sets: {1} {2} {3}

Learning Directionality of Inference Rules (2) –Underlying Assumptions • Distributional hypothesis (Harris 1954) • words that appear in the same contexts tend to have similar meanings • For modeling lexical semantics • Directionality hypothesis • If two binary semantic relations tend to occur in similar contexts and the first one occurs in significantly more contexts than the second, then the second most likely implies the first and not vice versa. • Generality: X eats Y 3000 次 Should be X eats Y ⇒ X likes Y X likes Y 8000 次

Learning Directionality of Inference Rules (3) –Underlying Assumptions (cont.) • Concept in semantic space • Being much richer for reasoning about inferences than simple surface words • Modeling the context of a relation pof the form <x, p, y> • using the semantic classes cxand cyof words that can be instantiated for x and y respectively • Context similarity of two relations • Overlap coefficient: |X ∩ Y| / min(|X|, |Y|)

Learning Directionality of Inference Rules (4) – Selectional Preferences • Relational selectional preferences (RSPs) of a binary relation p in <X, p, Y> • the set of semantic classes C(x) and C(y) of words x and y • C(x) = { cx : x in instance <x, p, y>, cx: the class of term x} • C(y) = { cy : y in instance <x, p, y>, cy: the class of term y} • Example: x likes y • using the semantic classes from WordNet • C(x) = {individual, social_group…} • C(y) = {individual, food, activity…}

Learning Directionality of Inference Rules (5) – Inference Plausibility and Directionality • Context similarity of two relations • The overlap coefficient of pi and pj • Example: ∩

Learning Directionality of Inference Rules (6) – Inference Plausibility and Directionality (cont.) α and β will be determined by experiments.

Learning Directionality of Inference Rules (7) – Two Models (JRM and IRM) • Model 1: Joint Relational Model (JRM) • Count the actual occurrences of relation p in the corpus • Model 2: Independent Relational Model (IRM) Context similarity of two relations Cartesian product

Experiment Setup (1) • Inference rules • choosing the inference rules from the DIRT resource (Lin and Pantel 2001) • DIRT consists of 12 million rules extracted from 1GB of newspaper text

Experiment Setup (2) • Semantic classes • Must having the right balance between abstraction and discrimination • The first set of semantic classes • obtained by running the CBC clustering algorithm (Pantel and Lin, 2002) • on TREC-9 and TREC-2002 newswire collections consisting of over 600 million words. • resulted in 1628 clusters, each representing a semantic class. • The second set of semantic classes • Obtained by using WordNet 2.1 (Fellbaum 1998) • A cut at depth four resulted in a set of 1287 semantic classes (only WordNet noun Hierarchy)

Experiment Setup (3) • Implementation • parsed the 1999 AP newswire collection consisting of 31 million words with Minipar (Lin 1993) • Gold Standard Construction • randomly sampled 160 inference rules of the form pi ⇔ pj from DIRT, removed 3 nominalization rules, resulted in 157 rules. • Using 2 annotators • 57 rules used for training set to train annotators • 100 rules used for blind test set for this two annotators • Inter-annotator agreement: kappa=0.63 • Revising the disagreements together to get the final gold standard

Experiment Setup (4) • Baselines • B-random • Randomly assigns one of the four possible tags to each candidate inference rule. • B-frequent • Assigns the most frequently occurring tag in the gold standard to each candidate inference rule • B-DIRT • Assumes each inference rule is bidirectional and assigns the bidirectional tag to each candidate inference rule.

Experimental Results (1) • Evaluation Criterion • Parameter combination • Ran all our algorithms with different parameter combinations on the development set (the 57 DIRT rules), resulted in a total of 420 experiments • Used the accuracy statistic to obtain the best parameter combination for each of our four systems • Then used these parameter values to obtain the corresponding percentage accuracies on the test set for each of the four systems

Experimental Results (2) 50 27 7 48 18 25 11 30 34

Experimental Results (3) Baseline 66% Baseline 48.48%

Conclusion • The problem of semantic inferences • fundamental to understanding natural language • an integral part of many natural language applications • The Directionality Hypothesis • The Directionality Hypothesis can indeed be used to filter incorrect inference rules • This result is one step in the direction of solving the basic problem of semantic inference

Thanks!!

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules

Presentation Transcript

Inference rules

Inference Rules

An Unsupervised WSD Algorithm for a NLP System

Unsupervised Learning

Rules of Inference

Unsupervised Learning

Rules of Inference

Unsupervised learning

Unsupervised Learning

Rules of Inference

Unsupervised Learning

Unsupervised Learning

Unsupervised learning

Unsupervised Learning

Unsupervised Learning

Unsupervised learning

Unsupervised Learning

Unsupervised Learning

Unsupervised Learning