From Grammar to N-grams

From Grammar to N-grams Estimating N-grams From a Context-Free Grammar and Sparse Data Thomas K Harris May 16, 2002

Motivation • Recognizers typically use n-grams. • Systems are typically defined by CFGs. • Data collection is difficult. • Goal: To have a language model that benefits from the grammar and the priors of the parses.

Other Approaches • Ignore data, use a language model derived from the grammar alone. • Ignore grammar, use a language model derived from the data alone. • Interpolate between these two models.

PCFG Strategy • Train grammar with some data. • Smooth grammar. • Compute n-grams. data PCFG N-grams CFG

The Software • Work in progress - available at http://www.cs.cmu.edu/~tkharris/pcfg • Written in C++ • A library (API) consists of a PCFG class and an n-gram class. • A program which uses the library to create n-grams from Phoenix grammars and data. • A make script to automate building and testing.

Procedure • Read Phoenix grammar file. • Convert to Chomsky Normal Form. • Read data and train grammar. • Smooth the grammar. • Compute n-grams from the smoothed PCFG.

Reading Phoenix Formats • Doesn’t handle #include directive. • Doesn’t handle +* (Kleen closure) marker. • Net – Rewrite distinction is ignored. • + and * markers are rewritten as rules. • Conversion to CNF permanently mangles rules.

Chomsky Normal Form • Remove ε-transitions. • Remove unit productions. • Change all rules A->βaγ of length >1 to A->βNγ and N->a. • Recursively shorten all rules A->βBC of length >2 to A->βN and N->BC.

Training • Initialize rule probabilities. • For each sentence, • Use CYK chart parser to compute inside and outside probabilities. • Use those probabilities to determine the expected number of times the rule is used in the sentence. • Use the expectations to get a new set of rule probabilities. • Repeat until the corpus likelihood appears to asymptote.

Smoothing • A user-specified probability mass can be redistributed over unseen rules. • At the bottom of the tree this generalizes a class-based model. • This only smoothes the trained grammar over other grammatical sentences.

Precise N-grams • Precise n-grams can be computed from a PCFG. • P(wn|w1…wn-1) = E(w1…wn|S)/E(w1…wn-1|S)

S S S A A A B B B Divide and Conquer S wn …w1-n… w1………….wn …w1-n…

Data • USI MovieLine oracle transcripts • 2,000 sentences • Used only parsable sentences (85%) • Divided into 60% training, 40% test

Results

Conclusions • Lower perplexities than pure-grammar method, comparable perplexities to pure-data method. • More flexible and cheaper than pure-data methods.

Future Directions • More smoothing work needs to be done. • Different smoothing over different classes • other smoothing methods?? • Trigrams • Testing for word error rate improvements • Adapting to modified grammars

From Grammar to N-grams

From Grammar to N-grams

Presentation Transcript

What are n-grams good for?

N-Grams and Corpus Linguistics

Matching Bibliographic Data from Publication Lists with Large Databases using N-Grams

Stoichiometry 2: grams to grams

How to teach grammar from examples

N-Grams and Corpus Linguistics

Contrasting instructions: from grammar to layout

Word-counts and N-grams

N-Grams and Corpus Linguistics

N-Grams and Corpus Linguistics

Grams to Moles to Molecules

From Web n-grams to collocation learning

Applicability of N-Grams to Data Classification

Converting Grams To Moles and Moles To Grams

1.500 grams .

Chapter 4: N-GRAMS

6. N-GRAMs

Natural Language Processing Statistical Inference: n-grams

Language Modeling with N-Grams

N-Grams

How many grams of product can be synthesized from 10.5 grams of oxygen and 3.75 grams of hydrogen?

Pattern Matching Using n -grams With Algebraic Signatures