Improving Subcategorization Acquisition using Word Sense Disambiguation

Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15 JJ Thomas Avenue, Cambridge CB3 0FD, UK Anna.Korhonen@cl.cam.ac.uk, Judita.Preiss@cl.cam.ac.uk

Outline • Subcategorization Acquisition Baseline System Baseline System combined with WSD • Probabilistic WSD • Experiment Evaluation Methods

Introduction • Subcategorization The dependents of a verb are classified in: arguments -subject, object, direct object - subject - non subject arguments (complements) e.g. Mary knows that she is wining. adjuncts e.g. She read the book with great interest. The type of complements that a verb permits gives the verb classification The verb classification is called subcategorization SCFs –subcategorization frames for a given predicate; essential for parsing

Introduction • SCFs- a particular set of arguments that a verb can appear with Intransitive verb. NP[subject]. They danced. Transitive verb. NP[subject], NP[object]. Mary appreciates her Professor. Intransitive with PP. NP[subject],PP. He leave in ParisTransitive with PP. NP[subject], NP[object], PP. She put the book on the table.

Introduction Manual subcategorization versus automatically one Manual - does not provide the relative frequency of SCFs - predicates change behavior Automatically - no lexical/semantic information is exploited; - reveals only syntactic aspects; - no distinction between predicate senses Korhonen(2002) model : back-off estimates which used the predominant sense of a verb (WordNet) Acquisition Goal – domain specific lexicon (written vs. spoken; genre based on different senses)

Subcategorization Acquisition • Baseline System – system with the knowledge of verb semantics Levin(93) - verb senses divides them in classes distinctive for subcategorization Korhonen(2002) - verb forms are able to divide them into semantic classes based on the predominant sense (fly - move) - determine the sense and the semantic class (Levin Classes “Motion verbs”) Briscoe Carroll(97) – SCF distribution are acquired from corpus data

Subcategorization Acquisition • Baseline System – description The linear interpolation smoothing back-off estimates is used for the SCF distribution The method of obtaining back-off estimates a) 4-5 representative verbs are chosen from a verb class b) for theses verbs the SCF distribution is built using manually analysis of 300 occurrences of each verb (BNC) c) the resulted SCF distributions are merged giving equal weight to each distribution E.g. fly - move, slide, arrive, travel, sail An empirical threshold is used to filter out noisy SCFs

Subcategorization Acquisition • Combining with WSD Preiss& Korhonen(02) - created different corpus datasets for the senses (first/and or second) being disambiguated and other datasets for the remaining senses - SCFs were acquired from both types of datasets - back-off estimates used for the SCFs acquired from the initial dataset, the estimates were used for smoothing according to the relevant sense - the SCF lexicons acquired were merged in the end SCF distribution was rather specific to a verb than a sense - problems with subcategorization acquisition: datasets too small, separation of the data was unnecessary

Subcategorization Acquisition • New method – does not involve separating data and it uses back-off estimates for the sense distribution given by the WSD system not only for the predominant sense pj(scfi), j=1..nb0 (nb0=the number of back-off estimates) - the probabilities of SCFs in different back-off distribution P(scfi)= ∑λj*pj(scfi); λj - weights for the different distributions that sum up to 1, are obtained from the probabilistic WSD system • Probabilistic WSD - able to determine the probability distribution for each noun, verb, adjective and adverb - able to determine a probability distribution on the senses for each verb and compute the average of it nb0 J=1

Subcategorization Acquisition • System Description - it is based on Stevenson and Wilks(2001) system which combines knowledge sources to produce a WSD Tool - it combines the probability distribution on senses determined by each module used; (modules described in Yarowsky(2000); Mihalcea(2002); Pederson(2002)) for the WSD probabilistic system - a process of smoothing is used for each module according to each confidence value; a low module confidence is smoothed extensively for uniform distribution - the optimal combination of modules is based on the accuracy (F-measure) for the English all-words task

Subcategorization Acquisition • Experiment Test Data - polysemous verbs with the predominant sense not very frequent – 29 verbs chosen randomly - the Levin-style senses are used to map the WordNet senses of the chosen verbs - he maximum number of Levin senses considered was 4 and some of the given senses were left out

Subcategorization Acquisition

Subcategorization Acquisition • Evaluation Method - 20 mil words of the BNC corpus and extracted all senses for the test verbs - 1000 sentences for each verb disambiguated with the probabilistic WSD - applied the modified subcategorization system - for each verb an individual set of back-off estimates was built based on the different frequency senses from the corpus data - results were evaluated against a manual analysis of the corpus data - for an average of 300 occurrences for each verb in the BNC test data 5-21 gold standard SCFs were found (16 SCFs per verb)

Subcategorization Acquisition • Evaluation Method F-measure = 2∙P∙R ∕ P+R; P-precision R-recall RC – Sperman rank correction KL – Kullback-Leibler distance CE – cross entropy - record the total number of SCFs missing in the distribution for determining the accuracy of the back-off estimates - comparison with other systems: the base-line and other which assumed no sense at all

Subcategorization Acquisition • Results - using the unsmoothed lexicon from a total of 175 unseen standard SCFs a number of 107 remain unseen after using the predominant sense method - using the WSD method only 22 remain unseen • the performance improves with the numbers of senses - IS measure reveals that between the acquired and the gold standard SCFs exists an intersection when WSD is used

Subcategorization Acquisition

Subcategorization Acquisition • Results -improvement for the highly polysemous verbs (bear, count, roar e.t.c) - verbs who differ substantially in terms of subcategorization (conceive, continue, grasp e.t.c) - verbs whose sense involves mainly NP/PP - SCFs seems to appear in data as “families” for a sense of a verb - worse performance for seek using WSD even though is highly polysemous and differs in terms of subcategorization -no clear improvement : choose, compose, induce, watch

Subcategorization Acquisition • Conclusions - using the WSD an improvement can be shown for SCFs acquisition of difficult verbs because the senses differ also in terms of subcategorization not only in the degree of polysemy • Future work -a better way of integrating the frequency of acquired senses into the SCFs and a refinancefor the subcategorization method

Improving Subcategorization Acquisition using Word Sense Disambiguation

Improving Subcategorization Acquisition using Word Sense Disambiguation

Presentation Transcript

Word Sense Disambiguation

Multilingual Word Sense Disambiguation using Wikipedia

Word Sense Disambiguation

Word Sense Disambiguation

Word Relations and Word Sense Disambiguation

Collective Word Sense Disambiguation

Word Sense Disambiguation (WSD)

Word Sense Disambiguation

Word Relations and Word Sense Disambiguation

Word Sense Disambiguation

Unsupervised Word Sense Disambiguation

Word Sense Disambiguation in Queries

Word Sense Disambiguation

Using Semantic Relatedness for Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation

Word Sense Disambiguation