Extracting lexical information with statistical models

Extracting lexical information with statistical models • Dennis (2003) compares two methods for inducing a word similarity measure from local context in a corpus. • Claims such measures capture syntactic, semantic, and associative information about words.

The Syntagmatic-Paradigmatic (SP) Model Partition corpus into equivalence classes of equal-length sentence fragments: A nice picture OF THE A quick copy OF THE A nice description OF THE ONTO THE picture OF ONTO THE copy OF

The Syntagmatic-Paradigmatic (SP) Model A picture OF THE A copy OF THE A description OF THE Define similarity within equivalence class C: PrC(w1,w2) = # words w1 & w2 fragments share total # words shared with w1 fragment in C Overall similarity: Pr(w1,w2) = mean PrC(w1,w2) in Cs with w1.

The Pooled Adjacent Context (PAC) Model Scan corpus for five word wide windows: found a picture of the found a picture in her a pretty picture of her Assign each word a high-dimensional vector: One component for each <word, relpos>. Component values are occurrence counts. Similarity: use Spearman’s rank correlation.

Sample results Most similar words: SP PAC Band group, kind, piece, statement, degree, bridge, hat, amount, lot, set, … clock, tribe, scene, … Agree want, believe, deal, depend, forget, realize, listen, play, try, talk, … survive, seek, recognize, … Nine six, four, several, five, twelve, fifteen, fifty, twenty, lunch, seven, eight, ten, least younger, rough, thirty, dinner, …

Syntactic results • 90% of time, cue and “similar” words share a basic WordNet syntactic category (N, V, ADJ, ADV) (both SP & PAC, top 10 similar words, chance 60%) • 60-70% with all 45 extended WordNet categories (chance 25%)  Can we do POS tagging with less labeled data?  Note: no phrase structure yet.

Semantic results 1 SP PAC Nine six, four, several, five, twelve, fifteen, fifty, twenty, lunch, seven, eight, ten, least, … younger, rough, thirty, dinner, … Australia China, India, Europe, Philadelphia, Brazil, Florida, Kansas, power, Canda, California, Cuba, vapor, senate, males, England, … Pennsylvania… • Mean LSA cosine between cue and similar words is 0.15 or so (both SP & PAC, top 10 similar words, chance 0.1)

Semantic results 2 • Can compare to human free association studies. Looking at 1,934 words: 300 FA words judged most similar (SP) 1000 FA words in top 5 similar (SP) 1400 FA words in top 10 similar (SP) (chance: < 100 in all cases)

Discussion - uses • What can we do with similarities? • Bootstrapping other learning processes (e.g. learning color words) • Retrieval of related information from DB • ???

Discussion – reference • What can we not do with similarities alone? Person to ATM: “I need ninety dollars.” Ninety seventy, sixty, ten, most, eighty, lunch, rough, dinner, …  Intuitively, useful agents need to know more than what “ninety”, “Australia”, etc., are similar to; they need to know what they refer to.

Discussion – inference DB contains “Cats only eat mice.” Query: “Do cats eat dogs?” only every, just, no, usually, none, forever, lunch, … Communication by inference is ubiquitous.  Intuitively, to answer such queries, we need to know more than what “only” is similar to, but also what inferences it licenses.

Extracting lexical information with statistical models

Extracting lexical information with statistical models

Presentation Transcript

Statistical Forecasting Models

Linear Statistical Models

Statistical Inventory Models

Building Statistical Models

Probabilistic Lexical Models for Textual Inference

Extracting structure information from data

Two statistical models on European and Croatian information society

Extracting information from French obituaries

Classifying Reading Levels with Statistical Language Models

Extracting Useful Information with the PowerSight Kit

Hydrologic Forecasting With Statistical Models

Extracting Space Weather Information from Research Models: Opportunities and Challenges

Extracting information from scientific papers:

Extracting models from design documents with Mapster

Linear Statistical Models with General Covariance Matrix

Statistical Shape Models

Classifying Reading Levels with Statistical Language Models

Statistical / empirical models

Extracting lexical information with statistical models

Statistical Inventory control models

Extracting Space Weather Information from Research Models: Opportunities and Challenges