70 likes | 208 Views
CS626-460: Language Technology for the Web/Natural Language Processing. Pushpak Bhattacharyya CSE Dept., IIT Bombay Beta probabilities; parser evaluation criteria. Inside and Outside probabilities and their usage Inside Probability β j (k,l) = P(w k,l |N j )
E N D
CS626-460: Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Beta probabilities; parser evaluation criteria
Inside and Outside probabilities and their usage Inside Probability βj(k,l) = P(wk,l|Nj) βj(k,l) gives the probability that Nj yields wk,l Nj Nj Nj wk wk wk wl wl
Outside Probability αj(k,l) = P(w1,k, Njk,l, wl+1,m) w1,m is the sentence Probability of Nj denotes wk,l surrounded by w1,k-1 and wl+1,m To calculate the probability of a sentence P(w1,m) = β1(1,m) Nj wj w1 wk wl wm
Recursive calculation of β βj(k,k) = P(wk,k|Nj) = P(Nj wk) Assume the grammar to be in Chomsky Normal Form(CNF) βj(k,l) = P(wk,l|Nj) = ∑p,q,mP(wk,m,Nk,m,wm+1,l,Nm+1,l|Nj) marginalization Nj Np Nq wk wm wm+1 wl
= ∑p,q,m P(Npk,m,Nqm+1,l|Nj) . P(wk,m|Npk,m,Nqm+1,l,Nj) . P(wm+1,l|Npk,m,Nqm+1,l,Nj,wk,m) = ∑p,q,m (Nj NpNq) . P(wk,m|Npk,m) . P(wm+1,l|Nqm+1,l) = ∑p,q,m P(Nj NpNq) . βp(k,m) . βq(m+1,l)
Assignment 3 Study and note the relative merits of Charniak Parser, Collins Parser, Stanford Parser, RASP (Robust, Accurate, Statistical Parser) Criteria • Robustness to ungrammaticality • Ranking in case of multiple parses • Time taken • How efficient is embedding handled • Example: The cat that killed the rat that stole the milk that spilled on the floor that was slippery escaped • How effectively is multiple POS handled • i.e. if the words are with numerous POS tags, does the parser still work • Can it handle repeated words with changing POS • Example: Buffalo buffaloes Buffalo buffaloes buffalo buffalo Buffalo buffaloes Black cows brown cows cow cow white cows • Length of the sentence
S S S S S S S S S S S S S S S S S S S VP VP VP VP VP VP VP VP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP S’ S’ S’ S’ S’ S’ S’ S’ S’ S’ S’ S’ S’ S’ S’ S’ S’ S’ S’ S’ S’ S’ S’ V V V V V V V V V V V V V V V V V V V V V N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP NP VP VP VP VP VP VP VP VP VP VP VP VP VP VP VP VP VP VP buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N Buffalo Buffalo Buffalo Buffalo Buffalo Buffalo Buffalo buffaloes buffaloes buffaloes buffaloes buffaloes buffaloes Buffalo Buffalo Buffalo Buffalo Buffalo buffaloes buffaloes buffaloes buffaloes Buffalo Buffalo buffaloes