1 / 11

Extracting lexical information with statistical models

This article compares two methods for inducing a word similarity measure from local context in a corpus. The methods capture syntactic, semantic, and associative information about words. The article discusses the Syntagmatic-Paradigmatic (SP) Model and the Pooled Adjacent Context (PAC) Model and presents sample results and implications for various applications.

cardm
Download Presentation

Extracting lexical information with statistical models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extracting lexical information with statistical models • Dennis (2003) compares two methods for inducing a word similarity measure from local context in a corpus. • Claims such measures capture syntactic, semantic, and associative information about words.

  2. The Syntagmatic-Paradigmatic (SP) Model Partition corpus into equivalence classes of equal-length sentence fragments: A nice picture OF THE A quick copy OF THE A nice description OF THE ONTO THE picture OF ONTO THE copy OF

  3. The Syntagmatic-Paradigmatic (SP) Model A picture OF THE A copy OF THE A description OF THE Define similarity within equivalence class C: PrC(w1,w2) = # words w1 & w2 fragments share total # words shared with w1 fragment in C Overall similarity: Pr(w1,w2) = mean PrC(w1,w2) in Cs with w1.

  4. The Pooled Adjacent Context (PAC) Model Scan corpus for five word wide windows: found a picture of the found a picture in her a pretty picture of her Assign each word a high-dimensional vector: One component for each <word, relpos>. Component values are occurrence counts. Similarity: use Spearman’s rank correlation.

  5. Sample results Most similar words: SP PAC Band group, kind, piece, statement, degree, bridge, hat, amount, lot, set, … clock, tribe, scene, … Agree want, believe, deal, depend, forget, realize, listen, play, try, talk, … survive, seek, recognize, … Nine six, four, several, five, twelve, fifteen, fifty, twenty, lunch, seven, eight, ten, least younger, rough, thirty, dinner, …

  6. Syntactic results • 90% of time, cue and “similar” words share a basic WordNet syntactic category (N, V, ADJ, ADV) (both SP & PAC, top 10 similar words, chance 60%) • 60-70% with all 45 extended WordNet categories (chance 25%)  Can we do POS tagging with less labeled data?  Note: no phrase structure yet.

  7. Semantic results 1 SP PAC Nine six, four, several, five, twelve, fifteen, fifty, twenty, lunch, seven, eight, ten, least, … younger, rough, thirty, dinner, … Australia China, India, Europe, Philadelphia, Brazil, Florida, Kansas, power, Canda, California, Cuba, vapor, senate, males, England, … Pennsylvania… • Mean LSA cosine between cue and similar words is 0.15 or so (both SP & PAC, top 10 similar words, chance 0.1)

  8. Semantic results 2 • Can compare to human free association studies. Looking at 1,934 words: 300 FA words judged most similar (SP) 1000 FA words in top 5 similar (SP) 1400 FA words in top 10 similar (SP) (chance: < 100 in all cases)

  9. Discussion - uses • What can we do with similarities? • Bootstrapping other learning processes (e.g. learning color words) • Retrieval of related information from DB • ???

  10. Discussion – reference • What can we not do with similarities alone? Person to ATM: “I need ninety dollars.” Ninety seventy, sixty, ten, most, eighty, lunch, rough, dinner, …  Intuitively, useful agents need to know more than what “ninety”, “Australia”, etc., are similar to; they need to know what they refer to.

  11. Discussion – inference DB contains “Cats only eat mice.” Query: “Do cats eat dogs?” only every, just, no, usually, none, forever, lunch, … Communication by inference is ubiquitous.  Intuitively, to answer such queries, we need to know more than what “only” is similar to, but also what inferences it licenses.

More Related