220 likes | 336 Views
Statistical NLP Course for Master in Computational Linguistics 2nd Year 2013-2014. Diana Trandabat. Intro to probabilities. Probability deals with prediction : Which word will follow in this ....? How can parses for a sentence be ordered? Which meaning is more likely?
E N D
Statistical NLPCourse for Master in Computational Linguistics2nd Year2013-2014 Diana Trandabat
Intro to probabilities • Probability deals with prediction: • Which word will follow in this ....? • How can parses for a sentence be ordered? • Which meaning is more likely? • Which grammar is more linguistically plausible? • See phrase “more lies ahead”. How likely is it that “lies” is noun? • See “Le chien est noir”. How likely is it that the correct translation is “The dog is black”? • Any rational decision can be described probabilistically.
Notations • Experiment (or trial) – repeatable process by which observations are made • e.g. tossing 3 coins • Observe basic outcome from sample space, Ω, (set of all possible basic outcomes) • Examples of sample spaces: • one coin toss, sample space Ω = { H, T }; • three coin tosses, Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} • part-of-speech of a word, Ω = {N, V, Adj, etc…} • next word in Shakespeare play, |Ω| = size of vocabulary • number of words in your Msc. Thesis Ω = { 0, 1, … ∞ }
Notation • An event A, is a set of basic outcomes, i.e., a subset of the sample space, Ω. Example: – Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} – e.g. basic outcome = THH – e.g. event = “has exactly 2 H’s” A={THH, HHT, HTH} – A=Ω is the certain event P(A=Ω)=1 –A=∅is the impossible eventP(A=∅) = 0 – For “not A” , we write Ā
Intro to probablities • The probability of an event is hard to compute. • It is easily to compute the estimation of probability, marked ^p(x). • When |X| , ^p(x) P(x)
Intro to probabilities • “A coin is tossed 3 times. • What is the likelihood of 2 heads?” – Experiment: Toss a coin three times – Sample space Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} – Event: basic outcomes that have exactly 2 H’s A = {THH, HTH, HHT} –the likelihood of 2 heads is 3 out of 8 possible outcomes P(A) = 3/8
Probability distribution • A probability distribution is an assignment of probabilities from a set of outcomes. • A uniform distribution assigns the same probability to all outcomes (eg a fair coin). • A gaussian distribution assigns a bell-curve over outcomes. • Many others. • Uniform and gaussians popular in SNLP.
Independent events • Two events are independent if: p(a,b)=p(a)*p(b) • Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. • Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”.
Independent events • Two events are independent if: p(a,b)=p(a)*p(b) • Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. • Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. • X={2, 4, 6}, Y={3, 6}
Independent events • Two events are independent if: p(a,b)=p(a)*p(b) • Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. • Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. • X={2, 4, 6}, Y={3, 6} • p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2 • p(Y)=p(3)+p(6)=1/3
Independent events • Two events are independent if: p(a,b)=p(a)*p(b) • Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. • Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. • X={2, 4, 6}, Y={3, 6} • p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2 • p(Y)=p(3)+p(6)=1/3 • p(X,Y)=p(6)=1/2*1/3=p(X)*p(Y)=1/6 • ==> X and Y are independents
Conditioned events • Non independent events are called conditioned events. • p(X|Y) == “the probability of having X if an Y event occurred. • p(X|Y)=p(X,Y) /p(Y) • p(X) == apriori probability(prior) • p(X|Y) = posterior probability
Are X and Y independent? p(X)=1/2, p(Y)=1/3, p(X,Y)=1/6, p(X |Y)= 1/2 ==> independent. • Consider Z the event “the number on the dice can be divided by 4” Are X and Z independent? p(Z)=p(4)=1 /6 p(X,Z)=1/6, p(X|Z)=p(X,Z) / p(Z)=1/6 /1/6=11/2 ==> non-indep.
Bayes’ Theorem • Bayes’ Theorem lets us swap the order of dependence between events • We saw that • Bayes’ Theorem:
Example • S:stiff neck, M: meningitis • P(S|M) =0.5, P(M) = 1/50,000 P(S)=1/20 • I have stiff neck, should I worry?
Example • S:stiff neck, M: meningitis • P(S|M) =0.5, P(M) = 1/50,000 P(S)=1/20 • I have stiff neck, should I worry?
Other useful relations: p(x)=p(x|y) *p(y) or p(x)=p(x,y) yY yY Chain rule: p(x1,x2,…xn) = p(x1)*p(x2| x1 )*p(x3| x1,x2)*... p(xn| x1,x2 ,…xn-1) The demonstration is easy, through successive reductions: Consider event y as coincident of events x1,x2 ,…xn-1 p(x1,x2,…xn)= p(y, xn)=p(y)*p(xn| y)= p(x1,x2 ,…xn-1)*p(xn | x1,x2 ,…xn-1) similar for the event z p(x1,x2,…xn-1)= p(z, xn-1)=p(z)*p(xn -1| z)= p(x1,x2 ,…xn-2)*p(xn -1| x1,x2 ,…xn-2) . . . p(x1,x2,…xn)=p(x1)*p(x2| x1 )*p(x3| x1,x2)*... p(xn| x1,x2 ,…xn-1) prior bigram, trigram, n-gram
Objections • People don’t compute probabilities. • Why would computers? • Or do they? • John went to … the market go red if number
Objections • Statistics only count words and co-occurrences • Two different concepts: • Statistical model and statistical method • The first doesn’t need the second one. • A person which used the intuition to raison is using a statistical model without statistical methods. • Objections refer mainly to the accuracy of statistical models.