Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011

CS460/626 : Natural Language Processing/Speech, NLP and the Web(Lecture 27– SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak BhattacharyyaCSE Dept., IIT Bombay 17th March, 2011

CMU Pronunciation Dictionary Assignment

Data • The Carnegie Mellon University Pronouncing Dictionary • machine-readable pronunciation dictionary for North American English that contains over 125,000 words and their transcriptions. • The current phoneme set contains 39 phonemes

“Parallel” Corpus Phoneme Example Translation ------- ------- ----------- AA odd AA D AE at AE T AH hut HH AH T AO ought AO T AW cow K AW AY hide HH AY D B be B IY

“Parallel” Corpus cntd Phoneme Example Translation ------- ------- ----------- CH cheese CH IY Z D dee D IY DH thee DH IY EH Ed EH D ER hurt HH ER T EY ate EY T F fee F IY G green G R IY N HH he HH IY IH it IH T IY eat IY T JH gee JH IY

The tasks • First obtain the Carnegie Mellon University's Pronouncing Dictionary • Create the Phrase Table using GIZA++ • For language modeling use SRILM • For decoding use Moses • Calculate precision, recall and F-score

Probabilistic Parsing

Bridging Classical and Probabilistic Parsing • The bridge between probabilistic parsing and classical parsing is the concept of domination • Frequency: P( NP -> DT NN) = 0.5 means in the corpus 50% of noun phrase is composed of determiner and noun • Phenomenon: P(NP -> DT NN) is actually P(DT NN |NP) i.e. join probability of domination by DT and NN to give rise to domination of NP • The concept of domination is the bridge between Frequency (probabilistic parsing) and Phenomenon (classical parsing).

Calculating Probability of a Sentence • We can either calculate P(s=w1 m) using naive N-gram based approach or by calculating • Which approach to choose?? • The velocity of waves rises near the shore. • Consecutive plural noun and singular verb is unlikely in the corpus. So low probability value for the sentence as given by n-gram.

Parse Tree S VP NP No other Parse Tree is possible for the sentence V PP NP PP P NP P DT NN NNS rises P N near velocity of waves The the shore

Various ways to calculate probability of sentence • Naïve n-gram based • Syntactic level (Parse tree) • Semantic Level • Pragmatics • Discourse

Probabilistic Context Free Grammars • DT  the 1.0 • NN  gunman 0.5 • NN  building 0.5 • VBD  sprayed 1.0 • NNS  bullets 1.0 • S  NP VP 1.0 • NP  DT NN 0.5 • NP  NNS 0.3 • NP  NP PP 0.2 • PP  P NP 1.0 • VP  VP PP 0.6 • VP  VBD NP 0.4

Example Parse t1` S1.0 P (t1) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.00225 • The gunman sprayed the building with bullets. NP0.5 VP0.6 NN0.5 DT1.0 PP1.0 VP0.4 P1.0 NP0.3 NP0.5 VBD1.0 The gunman DT1.0 NN0.5 with NNS1.0 sprayed the building bullets

Another Parse t2 S1.0 • The gunman sprayed the building with bullets. P (t2) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015 NP0.5 VP0.4 NN0.5 DT1.0 VBD1.0 NP0.2 The gunman sprayed NP0.5 PP1.0 DT1.0 NN0.5 P1.0 NP0.3 NNS1.0 the building with bullets

HMM ↔ PCFG • O observed sequence ↔ w1msentence • X state sequence ↔ t parse tree •  model ↔ G grammar • Three fundamental questions

HMM ↔ PCFG • How likely is a certain observation given the model?↔ How likely is a sentence given the grammar? • How to choose a state sequence which best explains the observations?↔ How to choose a parse which best supports the sentence? ↔ ↔

HMM ↔ PCFG • How to choose the model parameters that best explain the observed data? ↔ How to choose rule probabilities which maximize the probabilities of the observed sentences? ↔

Recap of HMM

HMM Definition • Set of states: S where |S|=N • Start state S0 /*P(S0)=1*/ • Output Alphabet: O where |O|=M • Transition Probabilities: A= {aij} /*state i to state j*/ • Emission Probabilities : B= {bj(ok)} /*prob. of emitting or absorbing ok from state j*/ • Initial State Probabilities: Π={p1,p2,p3,…pN} • Each pi=P(o0=ε,Si|S0)

Markov Processes • Properties • Limited Horizon: Given previous t states, a state i, is independent of preceding 0 to t-k+1 states. • P(Xt=i|Xt-1, Xt-2 ,…X0) = P(Xt=i|Xt-1, Xt-2… Xt-k) • Order k Markov process • Time invariance: (shown for k=1) • P(Xt=i|Xt-1=j) = P(X1=i|X0=j) …= P(Xn=i|Xn-1=j)

Three basic problems (contd.) • Problem 1: Likelihood of a sequence • Forward Procedure • Backward Procedure • Problem 2: Best state sequence • Viterbi Algorithm • Problem 3: Re-estimation • Baum-Welch ( Forward-Backward Algorithm )

Probabilistic Inference • O: Observation Sequence • S: State Sequence • Given O find S* where called Probabilistic Inference • Infer “Hidden” from “Observed” • How is this inference different from logical inference based on propositional or predicate calculus?

Essentials of Hidden Markov Model 1. Markov + Naive Bayes 2. Uses both transition and observation probability 3. Effectively makes Hidden Markov Model a Finite State Machine (FSM) with probability

Probability of Observation Sequence • Without any restriction, • Search space size= |S||O|

Continuing with the Urn example Colored Ball choosing Urn 1 # of Red = 30 # of Green = 50 # of Blue = 20 Urn 3 # of Red =60 # of Green =10 # of Blue = 30 Urn 2 # of Red = 10 # of Green = 40 # of Blue = 50

Example (contd.) Observation/output Probability Transition Probability Given : and Observation : RRGGBRGR What is the corresponding state sequence ?

Diagrammatic representation (1/2) G, 0.5 R, 0.3 B, 0.2 0.3 0.3 0.1 U1 U3 R, 0.6 0.5 0.6 0.2 G, 0.1 B, 0.3 0.4 0.4 R, 0.1 U2 G, 0.4 B, 0.5 0.2

Diagrammatic representation (2/2) R,0.03 G,0.05 B,0.02 R,0.18 G,0.03 B,0.09 R,0.15 G,0.25 B,0.10 R,0.18 G,0.03 B,0.09 U1 U3 R,0.02 G,0.08 B,0.10 R,0.06 G,0.24 B,0.30 R,0.24 G,0.04 B,0.12 R, 0.08 G, 0.20 B, 0.12 U2 R,0.02 G,0.08 B,0.10

Probabilistic FSM (a1:0.3) (a1:0.1) (a2:0.4) (a1:0.3) S1 S2 (a2:0.2) (a1:0.2) (a2:0.2) (a2:0.3) The question here is: “what is the most likely state sequence given the output sequence seen”

Developing the tree a2 a1 € 1.0 0.0 Start 0.3 0.2 0.1 0.3 • . • . 0.3 0.0 1*0.1=0.1 S1 S2 S2 S1 S2 S1 S2 S2 S1 S1 0.0 0.2 0.2 0.4 0.3 • . • . 0.1*0.2=0.02 0.1*0.4=0.04 0.3*0.3=0.09 0.3*0.2=0.06 Choose the winning sequence per state per iteration

Tree structure contd… 0.09 0.06 a2 a1 0.3 0.2 0.1 0.3 • . • . 0.027 0.012 0.018 0.09*0.1=0.009 S2 S1 S1 S2 S2 S2 S1 S1 S1 S2 0.3 0.2 0.4 0.2 • . 0.0048 0.0081 0.0054 0.0024 The problem being addressed by this tree is a1-a2-a1-a2 is the output sequence and μ the model or the machine

Viterbi Algorithm for the Urn problem (first two symbols) S0 ε 0.2 0.5 0.3 U1 U2 U3 0.15 0.02 0.18 0.03 0.18 0.06 R 0.08 0.24 0.02 U1 U2 U3 U1 U2 U3 U1 U2 U3 0.015 0.04 0.075* 0.018 0.006 0.036* 0.048* 0.036 0.006 *: winner sequences

Markov process of order>1 (say 2) • O0O1 O2 O3 O4 O5 O6 O7 O8 • Obs:ε R R G G B R G R • State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 Same theory works P(S).P(O|S) = P(O0|S0).P(S1|S0). [P(O1|S1). P(S2|S1S0)]. [P(O2|S2). P(S3|S2S1)]. [P(O3|S3).P(S4|S3S2)]. [P(O4|S4).P(S5|S4S3)]. [P(O5|S5).P(S6|S5S4)]. [P(O6|S6).P(S7|S6S5)]. [P(O7|S7).P(S8|S7S6)]. [P(O8|S8).P(S9|S8S7)]. We introduce the states S0 and S9 as initial and final states respectively. • After S8 the next state is S9 with probability 1, i.e., P(S9|S8S7)=1 O0 is ε-transition

Adjustments • Transition probability table will have tuples on rows and states on columns • Output probability table will remain the same • In the Viterbi tree, the Markov process will take effect from the 3rd input symbol (εRR) • There will be 27 leaves, out of which only 9 will remain • Sequences ending in same tupleswill be compared • Instead of U1, U2 and U3 • U1U1, U1U2, U1U3, U2U1, U2U2,U2U3, U3U1,U3U2,U3U3

Forward and Backward Probability Calculation

Forward probability F(k,i) • Define F(k,i)= Probability of being in state Si having seen o0o1o2…ok • F(k,i)=P(o0o1o2…ok , Si ) • With m as the length of the observed sequence • P(observed sequence)=P(o0o1o2..om) =Σp=0,N P(o0o1o2..om , Sp) =Σp=0,N F(m , p)

Forward probability (contd.) F(k , q) = P(o0o1o2..ok , Sq) = P(o0o1o2..ok , Sq) = P(o0o1o2..ok-1 , ok ,Sq) = Σp=0,N P(o0o1o2..ok-1 , Sp , ok ,Sq) = Σp=0,N P(o0o1o2..ok-1 , Sp ). P(om ,Sq|o0o1o2..ok-1 , Sp) = Σp=0,N F(k-1,p). P(ok ,Sq|Sp) = Σp=0,N F(k-1,p). P(Sp  Sq) ok • O0O1O2O3 … Ok Ok+1 … Om-1Om • S0 S1S2 S3 … Sp Sq…SmSfinal

Backward probability B(k,i) • Define B(k,i)= Probability of seeing okok+1ok+2…omgiven that the state was Si • B(k,i)=P(okok+1ok+2…om\ Si ) • With m as the length of the observed sequence • P(observed sequence)=P(o0o1o2..om) = P(o0o1o2..om| S0) =B(0,0)

Backward probability (contd.) B(k , p) = P(okok+1ok+2…om \ Sp) = P(ok+1ok+2…om , ok |Sp) = Σq=0,N P(ok+1ok+2…om , ok , Sq|Sp) = Σq=0,N P(ok ,Sq|Sp) P(ok+1ok+2…om|ok ,Sq ,Sp ) = Σq=0,N P(ok+1ok+2…om|Sq). P(ok , Sq|Sp) = Σq=0,N B(k+1,q). P(Sp  Sq) ok • O0O1O2O3 … Ok Ok+1 … Om-1Om • S0 S1S2 S3 … Sp Sq…SmSfinal

Back to PCFG

Interesting Probabilities N1 What is the probability of having a NP at this position such that it will derive “the building” ? - Inside Probabilities NP The gunman sprayed the building with bullets 1 2 3 4 5 6 7 Outside Probabilities What is the probability of starting from N1 and deriving “The gunman sprayed”, a NP and “with bullets” ? -

Interesting Probabilities • Random variables to be considered • The non-terminal being expanded. E.g., NP • The word-span covered by the non-terminal. E.g., (4,5) refers to words “the building” • While calculating probabilities, consider: • The rule to be used for expansion : E.g., NP  DT NN • The probabilities associated with the RHS non-terminals : E.g., DT subtree’s inside/outside probabilities & NN subtree’s inside/outside probabilities

Outside Probability • j(p,q) :The probability of beginning with N1& generating the non-terminal Njpq and all words outside wp..wq N1 Nj w1 ………wp-1 wp…wqwq+1 ………wm

Inside Probabilities • j(p,q) :The probability of generating the words wp..wq starting with the non-terminal Njpq. N1  Nj  w1 ………wp-1 wp…wqwq+1 ………wm

Outside & Inside Probabilities: example N1 NP The gunman sprayed the building with bullets 1 2 3 4 5 6 7

Calculating Inside probabilities j(p,q) Base case: • Base case is used for rules which derive the words or terminals directly E.g., Suppose Nj = NN is being considered & NN  building is one of the rules with probability 0.5

Induction Step: Assuming Grammar in Chomsky Normal Form Induction step : Nj • Consider different splits of the words - indicated by dE.g., the huge building • Consider different non-terminals to be used in the rule: NP  DT NN, NP  DT NNS are available options Consider summation over all these. Nr Ns wp wd wd+1 wq Split here for d=2 d=3

The Bottom-Up Approach NP0.5 • The idea of induction • Consider “the gunman” • Base cases : Apply unary rules DT  the Prob = 1.0 NN  gunman Prob = 0.5 • Induction : Prob that a NP covers these 2 words = P (NP  DT NN) * P (DT deriving the word “the”) * P (NN deriving the word “gunman”) = 0.5 * 1.0 * 0.5 = 0.25 DT1.0 NN0.5 The gunman

Parse Triangle • A parse triangle is constructed for calculating j(p,q) • Probability of a sentence using j(p,q):

Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011

Presentation Transcript

Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th April, 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 3 rd and 7 th Feb, 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 21 st March, 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 31 st March, 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 11 th April, 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 22 nd March, 2011

Pushpak Bhattacharyya CSE Dept . IIT Bombay 1 st Nov, 2012

Pushpak Bhattacharyya CSE Dept., IIT Bombay 12 th April, 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th and 18 th Oct, 2012

Pushpak Bhattacharyya CSE Dept., IIT Bombay 7 th April, 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th Jan , 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 29 th March, 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th , 10 th March , 2011

Pushpak Bhattacharyya CSE Dept. IIT Bombay 19 May, 2014

Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb , 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th March, 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 20 th Jan , 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 5 th and 6 th Nov , 2012

Pushpak Bhattacharyya CSE Dept., IIT Bombay 10 th Jan , 2011

Pushpak Bhattacharyya CSE Dept., IIT Bombay 11 th Nov, 2012

Pushpak Bhattacharyya CSE Dept., IIT Bombay