110 likes | 316 Views
Extracting lexical information with statistical models. Dennis (2003) compares two methods for inducing a word similarity measure from local context in a corpus. Claims such measures capture syntactic, semantic, and associative information about words. .
E N D
Extracting lexical information with statistical models • Dennis (2003) compares two methods for inducing a word similarity measure from local context in a corpus. • Claims such measures capture syntactic, semantic, and associative information about words.
The Syntagmatic-Paradigmatic (SP) Model Partition corpus into equivalence classes of equal-length sentence fragments: A nice picture OF THE A quick copy OF THE A nice description OF THE ONTO THE picture OF ONTO THE copy OF
The Syntagmatic-Paradigmatic (SP) Model A picture OF THE A copy OF THE A description OF THE Define similarity within equivalence class C: PrC(w1,w2) = # words w1 & w2 fragments share total # words shared with w1 fragment in C Overall similarity: Pr(w1,w2) = mean PrC(w1,w2) in Cs with w1.
The Pooled Adjacent Context (PAC) Model Scan corpus for five word wide windows: found a picture of the found a picture in her a pretty picture of her Assign each word a high-dimensional vector: One component for each <word, relpos>. Component values are occurrence counts. Similarity: use Spearman’s rank correlation.
Sample results Most similar words: SP PAC Band group, kind, piece, statement, degree, bridge, hat, amount, lot, set, … clock, tribe, scene, … Agree want, believe, deal, depend, forget, realize, listen, play, try, talk, … survive, seek, recognize, … Nine six, four, several, five, twelve, fifteen, fifty, twenty, lunch, seven, eight, ten, least younger, rough, thirty, dinner, …
Syntactic results • 90% of time, cue and “similar” words share a basic WordNet syntactic category (N, V, ADJ, ADV) (both SP & PAC, top 10 similar words, chance 60%) • 60-70% with all 45 extended WordNet categories (chance 25%) Can we do POS tagging with less labeled data? Note: no phrase structure yet.
Semantic results 1 SP PAC Nine six, four, several, five, twelve, fifteen, fifty, twenty, lunch, seven, eight, ten, least, … younger, rough, thirty, dinner, … Australia China, India, Europe, Philadelphia, Brazil, Florida, Kansas, power, Canda, California, Cuba, vapor, senate, males, England, … Pennsylvania… • Mean LSA cosine between cue and similar words is 0.15 or so (both SP & PAC, top 10 similar words, chance 0.1)
Semantic results 2 • Can compare to human free association studies. Looking at 1,934 words: 300 FA words judged most similar (SP) 1000 FA words in top 5 similar (SP) 1400 FA words in top 10 similar (SP) (chance: < 100 in all cases)
Discussion - uses • What can we do with similarities? • Bootstrapping other learning processes (e.g. learning color words) • Retrieval of related information from DB • ???
Discussion – reference • What can we not do with similarities alone? Person to ATM: “I need ninety dollars.” Ninety seventy, sixty, ten, most, eighty, lunch, rough, dinner, … Intuitively, useful agents need to know more than what “ninety”, “Australia”, etc., are similar to; they need to know what they refer to.
Discussion – inference DB contains “Cats only eat mice.” Query: “Do cats eat dogs?” only every, just, no, usually, none, forever, lunch, … Communication by inference is ubiquitous. Intuitively, to answer such queries, we need to know more than what “only” is similar to, but also what inferences it licenses.