CS626-449: Speech, NLP and the Web/Topics in AI

CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG

Recap: Two Views of NLP Classical View: Layered Procssing;Various Ambiguities (already discussed) Statistical/Machine Learning View

What is the output of an ML-NLP System (1/2) • Option 1: A set of rules, e.g., • If the word to the left of the verb is a noun and has animacy feature, then it is the likely agent of the action denoted by the verb. • The child broke the toy (child is the agent) • The window broke (window is not the agent; inanimate)

What is the output of an ML-NLP System (2/2) • Option 2: a set of probability values • P(agent|word is to the left of verb and has animacy) > P(object|word is to the left of verb and has animacy)> P(instrument|word is to the left of verb and has animacy) etc.

How is this different from classical NLP Classical NLP Linguist rules Computer rules/probabilities Text data corpus Statistical NLP

Classification appears as sequence labeling

A set of Sequence Labeling Tasks: smaller to larger units • Words: • Part of Speech tagging • Named Entity tagging • Sense marking • Phrases: Chunking • Sentences: Parsing • Paragraphs: Co-reference annotating

Example of word labeling: POS Tagging <s> Come September, and the UJF campus is abuzz with new and returning students. </s> <s> Come_VB September_NNP ,_, and_CC the_DT UJF_NNP campus_NN is_VBZ abuzz_JJ with_IN new_JJ and_CC returning_VBG students_NNS ._. </s>

Example of word labeling: Named Entity Tagging <month_name> September </month_name> <org_name> UJF </org_name>

Example of word labeling: Sense Marking WordSynsetWN-synset-no come {arrive, get, come} 01947900 . . . abuzz {abuzz, buzzing, droning} 01859419

Example of phrase labeling: Chunking Come July, and is abuzz with . the UJF campus new and returning students

Example of Sentence labeling: Parsing [S1[S[S[VP[VBCome][NP[NNPJuly]]]] [,,] [CC and] [S [NP [DT the] [JJ UJF] [NN campus]] [VP [AUX is] [ADJP [JJ abuzz] [PP[IN with] [NP[ADJP [JJ new] [CC and] [ VBG returning]] [NNS students]]]]]] [..]]]

Handling labeling through the Noisy Channel Model w t (wn, wn-1, … , w1) (tm, tm-1, … , t1) Noisy Channel Sequence w is transformed into sequence t.

Bayesian Decision Theory and Noisy Channel Model are close to each other • Bayes Theorem : Given the random variables A and B, Posterior probability Prior probability Likelihood

Corpus • A collection of text called corpus, is used for collecting various language data • With annotation: more information, but manual labor intensive • Practice: label automatically; correct manually • The famous Brown Corpus contains 1 million tagged words. • Switchboard: very famous corpora 2400 conversations, 543 speakers, many US dialects, annotated with orthography and phonetics

Discriminative vs. Generative Model W* = argmax (P(W|SS)) W Generative Model Discriminative Model Compute directly from P(W|SS) Compute from P(W).P(SS|W)

Notion of Language Models

Language Models • N-grams: sequence of n consecutive words/chracters • Probabilistic / Stochastic Context Free Grammars: • Simple probabilistic models capable of handling recursion • A CFG with probabilities attached to rules • Rule probabilities  how likely is it that a particular rewrite rule is used?

PCFGs • Why PCFGs? • Intuitive probabilistic models for tree-structured languages • Algorithms are extensions of HMM algorithms • Better than the n-gram model for language modeling.

Formal Definition of PCFG • A PCFG consists of • A set of terminals {wk}, k = 1,….,V {wk} = { child, teddy, bear, played…} • A set of non-terminals {Ni}, i = 1,…,n {Ni} = { NP, VP, DT…} • A designated start symbol N1 • A set of rules {Ni j}, where j is a sequence of terminals & non-terminals NP  DT NN • A corresponding set of rule probabilities

Rule Probabilities • Rule probabilities are such that E.g., P( NP  DT NN) = 0.2 P( NP  NN) = 0.5 P( NP  NP PP) = 0.3 • P( NP  DT NN) = 0.2 • Means 20 % of the training data parses use the rule NP  DT NN

CS626-449: Speech, NLP and the Web/Topics in AI