Introduction and Overview

Improving Semi-Supervised Acquisition of Relation Extraction PatternsMark A. GreenwoodMark Stevenson

Introduction and Overview • Recently a number of semi-supervised approaches to acquiring Information Extraction (IE) patterns have been reported. • Many of these approaches have used a iterative algorithms to learn new patterns from a small seed set. • These approaches tend to be limited by their use of simplistic pattern representations, such as subject-verb-object (Yangarber et al., 2000) • Other approaches to IE have used pattern representations derived from dependency trees: • Sudo et al (2003) used patterns consisting of a path from a verb to any of its descendents (direct or indirect) - the chain model • Bunescu and Mooney (2005) suggest the shortest path between the items being related. Information Extraction Beyond the Document

Introduction and Overview • These more complex pattern models: • Are capable of representing more of the information present in text • Require more complex methods of determining similarity between patterns which limit their use. • We present a structural similarity measure inspired by kernel methods used in non-iterative learning algorithms (Culotta and Sorensen, 2004) • This allows us to use more complex pattern models while retaining the semi-supervised iterative algorithm approach to acquiring new extraction patterns. Information Extraction Beyond the Document

Learning Extraction Patterns Patterns Iterative Learning Algorithm • Begin with set of seed patterns which are known to be good extraction patterns • Compare every other pattern with the ones known to be good • Choose the highest scoring of these and add them to the set of good patterns • Stop if enough patterns have been learned, else repeat from step 2. Candidates Seeds Rank Information Extraction Beyond the Document

Learning Extraction Patterns • Such an algorithm requires for each IE task: • Unannotated text from which to acquire patterns • A small set of representative seed patterns • Independent of the IE task this iterative algorithm requires : • An extraction pattern model • A measure of how similar two patterns are to each other Information Extraction Beyond the Document

Extraction Patterns • Linked chain model (Greenwood et al., 2005) used as pattern representation Information Extraction Beyond the Document

Extraction Patterns Information Extraction Beyond the Document

Structural Similarity Measure • This similarity measure is inspired by the tree kernel proposed by Culotta and Sorensen (2004). • It compares patterns by following their structure from the root nodes through the patterns until they diverge too far to be considered similar. • Each node in a pattern has three features • The word – nword • The relation to a parent – nreln • The part-of-speech (POS) tag – npos • Nodes can be compared by examining these features and by the semantic similarity between words. Information Extraction Beyond the Document

Structural Similarity Measure • A set of our functions F = {word, relation, pos, semantic} is used to compare nodes • The first three correspond to the node features of the same name and return 1 if the value of the feature is equal for the two nodes and 0 otherwise. • For example the pos function compares the values of the POS features for nodes n1 and n2 • The semantic function return a value between 0 and 1 to signify the semantic similarity of the lexical items represented by the two nodes. We compute this using the WordNet (Fellbaum, 1998) similarity function introduced by Lin (1998). Information Extraction Beyond the Document

Structural Similarity Measure • The similarity of two nodes is zero if their POS tags are different, and otherwise is simply the sum of the scores provided by the four function from F. • The similarity of a pair of linked chains l1 and l2 is given by: • Where r1 and r2 are the root nodes of patterns l1 and l2 and Cr is the set of children of node r. Information Extraction Beyond the Document

Structural Similarity Measure • The final part of the measure calculates the similarity between the child nodes of n1 and n2. • As only the root nodes of the patterns have multiple children in all but the first application this formula simplifies to • As the maximum similarity between two nodes is 4 we normalise by dividing the score by 4 times the size (in nodes) of the larger pattern to remove length bias. Information Extraction Beyond the Document

Experiments - Overview • We used the similarity measure in the iterative algorithm described earlier • The four highest scoring patterns are accepted at each iteration • Only if their score is within 0.95 of the highest scoring pattern • We compare this approach with our (Stevenson and Greenwood, 2005) previous approach based on the vector space model and cosine similarity. • Three separate configurations • Cosine (SVO): uses the SVO model with the cosine similarity measure • Cosine (Linked Chains): same as above but uses linked chains • Structural (Linked Chains): uses linked chain patterns with the new structural similarity measure Information Extraction Beyond the Document

Experiments - IE Scenario • We use the data from the MUC-6 management succession task • We use a sentence level version produced by Soderland (1999) • This corpus contains four types of relation: Person-Person, Person-Post, Person-Organisation, and Post-Organisation • At each iteration of the algorithm related items recognised by the current set of acquired patterns are extracted and evaluated. • The texts have been previously annotated with named entities and MINIPAR is used to produce the dependency analysis. Information Extraction Beyond the Document

Experiments – Seed Patterns COMPANY subj— appoint —obj PERSON COMPANY subj— elect —obj PERSON COMPANY subj— promote —obj PERSON PERSON subj— resign PERSON subj— depart PERSON subj— quit • These seeds were choose due to their use in previously reported work. • No tuning of this set was performed • It should be noted that they do not contain the Person-Post or Post-Organisation relations Information Extraction Beyond the Document

Results and Analysis • The seed patterns achieve an F-measure of 0.044 (P=0.833, R=0.022) • Cosine Similarity performs poorly irrespective of the pattern model • Linked chains perform better than SVO under this similarity measure which suggests the model is inherently superior • Best result is the combination of linked chains and the structural similarity measure, F-measure of 0.329 (P=0.434, R=0.265) after 190 iterations Information Extraction Beyond the Document

Results and Analysis Information Extraction Beyond the Document

Conclusions • The results show that semi-supervised approaches to IE pattern acquisition benefit from the use of more expressive extraction pattern models. • Using linked chains resulted in better performance than using SVO even when using the same similarity measure • Similarity measures (such as kernel methods) developed for supervised learning can be adapted and applied to semi-supervised approaches. • Future work should look at other similarity functions used in supervised learning to see if they can also be adapted for use with semi-supervised approaches. • The structural similarity measure introduced here outperforms a previously proposed method based on cosine similarity and a vector space representation. Information Extraction Beyond the Document

Any Questions? Information Extraction Beyond the Document

Bibliography Razvan Bunescu and Raymond Mooney. 2005. A Shortest Path Dependency Kernel for Relation Extraction. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 724-731, Vancouver, B.C. Aron Culotta and Jeffery Sorensen. 2004. Dependency Tree Kernels for Relation Extraction. In 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain. Christiance Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database and some of its Applications. MIT Press, Cambridge, MA. Dekang Lin. 1998. An Information-Theoretic Definition of Similarity. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML-98), Madison, Wisconsin. Dekang Lin. 1999. MINIPAR: A Minimalist Parser. In Maryland Linguistics Colloquium, University of Maryland, College Park. Stephen Soderland. 1999. Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning, 31(1-3):233-272. Mark Stevenson and Mark A. Greenwood. A Semantic Approach to IE Pattern Induction. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 379-386, Ann Arbor, MI. Kiyoshi Sudo, Satoshi Sekine, and Ralph Grishman. 2003. An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), pages 224-231, Sapporo, Japan. Roman Yangarber, Ralph Grishman, Pasi Tapanainen, and Silja Huttenen. 200. Automatic Acquisition of Domain Knowledge for Information Extraction. In Proceedings of the 18th International Conference on Computational Linguistics (CLOING 2000), pages 940-946, Saarbrücken, Germany. Information Extraction Beyond the Document

Introduction and Overview