Lexical Acquisition

Lexical Acquisition Extending our information about words, particularly quantitative information

Why lexical acquisition? • “one cannot learn a new language by reading a bilingual dictionary” -- Mercer • Parsing ‘postmen’ requires context • quantitative information is difficult to collect by hand • e.g., priors on word senses • productivity of language • Lexicons need to be updated for new words and usages

Machine-readable Lexicons contain... • Lexical vs syntactic information • Word senses • Classifications, subclassifications • Collocations • Arguments, preferences • Synonyms, antonyms • Quantitative information

Gray area between lexical and syntactic • The rules of grammar are syntactic. • S ::= NP V NP • S ::= NP [V NP PP] • But which one to use, when? • The children ate the cake with their hands. • The children ate the cake with blue icing.

Outline of chapter • verb subcategorization • Which arguments (e.g. infinitive, DO) does a particular verb admit? • attachment ambiguity • What does the modifier refer to? • selectional preferences • Does a verb tend to restrict its object to a certain class? • semantic similarity between words • This new word is most like which words?

Verb subcategorization frames • Assign to each verb the sf’s legal for it. (see diagram) • Crucial for parsing. • She told the man where Peter grew up. • (NP NP S) • She found the place where Peter grew up. • (NP NP)

Brent’s method (1993) • Learn subcategorizations given a corpus, lexical analyzer, and cues. • A cue is a pair <L,SF>: • L is a star-free regular expression over lexemes • (OBJ | SUBJ-OBJ | CAP) (PUNC | CC) • SF is a subcategorization frame • NP NP • Strategy: find verb sf’s for which the cues provide strong evidence.

Brent’s method (cont’d) • Compute the error rate of the cue E = Pr(false positives) • For each verb v and cue c = <L,SF>, • Test the hypothesis H0 that verb v does not admit SF. • pE = • If pE < a threshold, reject H0.

Subcategorization Frames: Ideas • Hypothesis testing gives high precision, low recall. • Unreliable cues are necessary and helpful (independence assumption) • Find SF’s for verb classes, rather than verbs, using a buggy tagger. • As long as error estimates are incorporated into pE, it works great. • Manning did this, and improved recall.

Attachment Ambiguity: PPs • NP V NP PP -- Does PP mdify V or NP? • Assumption: there is only one meaningful parse for each sentence: • The children ate the cakewith a spoon. • Bush sent 100,000 soldiersinto Kuwait. • Brazil honored their dealwith the IMF. • Straw man: compare co-occurrence counts between pairs <send, into> and <soldiers, into>.

Bias defeats simple counting • Prob(into | send) > Prob(into | soldiers). • Sometimes there will be strong association between PP and both V and NP. • Ford ended its venturewith Fiat. • In this case, there is a bias toward “low attachment” -- attaching PP to the nearer referent, NP.

Hindle and Ruth (1993) • Elegant (?) method of quantifying the low attachment bias • Express P(first PP after object attaches to object) and P(first PP after object attaches to verb) as a function of P(NA) = P(there is a PP following the object attaching to object) and P(VA) = P(there is a PP following the object attaching to verb) • Estimate P(NA) and P(VA) based on counting

Estimating P(NA) and P(VA) • <v,n,p> are a particular verb, noun, and preposition • P(VAp | v) = • (# times p attaches to v)/(# occs of v) • P(NAp | n) = • (# times p attaches to n)/(# occs of v) • The two are treated as independent!

Attachment of first PP • P(Attach(p,n) | v,n) = P(NAp | n) • Whenever there is a PP attaching to the noun, the first such PP attaches to the noun! • P(Attach(p,v) | v,n) = P((not NAp) | n) P(VAp | v) • Whenever there is no PP attaching to the noun, AND a PP attaching to verb… • I (put the [book on the table) on WW2]

Selectional Preferences • Verbs prefer classes of subjects, objects: • Objects of ‘eat’ tend to be food items • Subjects of ‘think’ tend to be people • Subjects of ‘bark’ tend to be dogs • Used to • disambiguate word sense • infer class of new words • rank multiple parses

Disambiguate the class (Resnick) • She interrupted the chair. • A(nc) = D(P(nc | v) || P(nc)) = P(nc|v)log(P(nc|v)/P(nc)) • Relative entropy, or Kullback Leibler distance • A(furniture) = P(furniture | interrupted) * log((P(furniture | interrupted) / P(furniture))

Estimating P(nc | v) • P(nc | v) = P(nc,v) / P(v) • P(v) is estimated to be the proportion of occurrences v among all verbs • P(nc,v) is proposed to be • 1/N Σ(n in nc) C(v,n)/|classes(n)| • Now just take the class with highest A(nc) for maximum likelihood word sense.

Semantic similarity • Uses • classifying a new word • expand queries in IR • Are two words similar... • When they are used together? • IMF and Brazil • When they are on the same topic? • astronaut and spacewalking • When they function interchangeably? • Soviet and American • When they are synonymous? • astronaut and cosmonaut

Cosine is no panacea • Corresponds to Euclidean distance between points • Should document-space vectors be treated as points? • Alternative: treat them as probability distributions (after normalizing) • Now, no reason to use cosine. Why not try information-theoretic approach?

Alternatives distance metrics to cosine • Cosine of square roots (Goldszmidt) • L1 norm -- Manhattan distance • Sum of absolute value of difference of components • KL Distance • D(p || q) • Mutual information (why not?) • D(p ^ q || pq) • Information radius -- information lost describing both p and q by their midpoint. • IRAD(p,q) = D(p||m) + D(q||m)

Lexical Acquisition

Lexical Acquisition

Presentation Transcript

Lexical Analyzer

Lexical Innovation

Lexical Analyzer

Optimizing a Lexical Approach to Instructed Second Language Acquisition

8. Lexical Acquisition

Lexical Phonology

Lexical Nets

Lexical Analysis

Lexical Analysis

Lexical Analysis

Automatic acquisition for low frequency lexical items

Lexical Analysis

LEXICAL ANALYSIS

Acquisition of Lexical Knowledge for NLP

Ch.8 Lexical Acquisition

Lexical Semantics

Multilingual Lexical Acquisition by Bootstrapping Cognate Seed Lexicons

Lexical Tone Acquisition through Typed Interactions

Lexical acquisition through particular adjectival endings for Croatian

Optimizing a Lexical Approach to Instructed Second Language Acquisition

Lexical Analysis

Teaching Lexical Phrases and Lexical Patterns