240 likes | 345 Views
CSA4050: Advanced Topics in NLP. Probability I Experiments/Outcomes/Events Independence/Dependence Bayes’ Rule Conditional Probability/Chain Rule. Acknowledgement. Much of this material is based on material by Mary Dalrymple, Kings College, London. Experiment, Basic Outcome, Sample Space.
E N D
CSA4050:Advanced Topics in NLP Probability I Experiments/Outcomes/Events Independence/Dependence Bayes’ Rule Conditional Probability/Chain Rule CSA4050: Crash Concepts in Probability
Acknowledgement • Much of this material is based on material by Mary Dalrymple, Kings College, London CSA4050: Crash Concepts in Probability
Experiment, Basic Outcome,Sample Space • Probability theory is founded upon the notion of an experiment. • An experiment is a situation which can have one or more different basic outcomes. • Example: if we throw a die, there are six possible basic outcomes. • A Sample SpaceΩis a set of all possible basic outcomes. For example, • If we toss a coin, Ω = {H,T} • If we toss a coin twice, Ω = {HT,TH,TT,HH} • if we throw a die, Ω = {1,2,3,4,5,6} CSA4050: Crash Concepts in Probability
Event • An Event A Ωis a set of basic outcomes e.g. • tossing two heads {HH} • throwing a 6, {6} • getting either a 2 or a 4, {2,4}. • Ω itself is the certain event, whilst { } is the impossible event. • Event Space ≠ Sample Space CSA4050: Crash Concepts in Probability
Probability distribution • A probability distribution of an experiment is a function that assigns a number (or probability) between 0 and 1 to each basic outcome such that the sum of all the probabilities = 1. • The probability p(E) of an event E is the sum of the probabilities of all the basic outcomes in E. • Uniform distribution is when each basic outcome is equally likely. CSA4050: Crash Concepts in Probability
Probability of an Event: die example • Sample space = set of basic outcomes = {1,2,3,4,5,6} • If the die is not loaded, distribution is uniform. • Thus for each basic outcome, e.g. {6} (throwing a six) is assigned the same probability = 1/6. • So p({3,6}) = p({3}) + p({6}) = 2/6 = 1/3 CSA4050: Crash Concepts in Probability
Estimating Probability • Repeat experiment T times and count frequency of E. • Estimated p(E) = count(E)/count(T) • This can be done over m runs, yielding estimates p1(E),...pm(E). • Best estimate is (possibly weighted) average of individual pi(E) CSA4050: Crash Concepts in Probability
3 times coin toss • Ω= {HHH,HHT,HTH,HTT,THH,THT,TTH,TTT} • Cases with exactly 2 tails = {HTT, THT,TTH} • Experimenti = 1000 cases (3000 tosses). • c1(E)= 386, p1(E) = .386 • c2(E)= 375, p2(E) = .375 • pmean(E)= (.386+.375)/2 = .381 • Uniform distribution is when each basic outcome is equally likely. • Assuming uniform distribution, p(E) = 3/8 = .375 CSA4050: Crash Concepts in Probability
Word Probability • General Problem:What is the probability of the next word/character/phoneme in a sequence, given the first N words/characters/phonemes. • To approach this problem we study an experiment whose sample space is the set of possible words. • N.B. The same approach could be used to study the the probability of the next character or phoneme. CSA4050: Crash Concepts in Probability
Word Probability • Approximation 1: all words are equally probable • Then probability of each word = 1/N where N is the number of word types. • But all words are not equally probable • Approximation 2: probability of each word is the same as its frequency of occurrence in a corpus. CSA4050: Crash Concepts in Probability
Word Probability • Estimate p(w) - the probability of word w: • Given corpus Cp(w) count(w)/size(C) • Example • Brown corpus: 1,000,000 tokens • the: 69,971 tokens • Probability of the: 69,971/1,000,000 .07 • rabbit: 11 tokens • Probability of rabbit: 11/1,000,000 .00001 • conclusion: next word is most likely to be the • Is this correct? CSA4050: Crash Concepts in Probability
A counter example • Given the context: Look at the cute ... • is the more likely than rabbit? • Context matters in determining what word comes next. • What is the probability of the next word in a sequence, given the first N words? CSA4050: Crash Concepts in Probability
Independent Events A: eggs B: monday sample space CSA4050: Crash Concepts in Probability
Sample Space (eggs,mon) (cereal,mon) (nothing,mon) (eggs,tue) (cereal,tue) (nothing,tue) (eggs,wed) (cereal,wed) (nothing,wed) (eggs,thu) (cereal,thu) (nothing,thu) (eggs,fri) (cereal,fri) (nothing,fri) (eggs,sat) (cereal,sat) (nothing,sat) (eggs,sun) (cereal,sun) (nothing,sun) CSA4050: Crash Concepts in Probability
Independent Events • Two events, A and B, are independent if the fact that A occurs does not affect the probability of B occurring. • When two events, A and B, are independent, the probability of both occurring p(A,B) is the product of the prior probabilities of each, i.e. p(A,B) = p(A) · p(B) CSA4050: Crash Concepts in Probability
Dependent Events • Two events, A and B, are dependent if the occurrence of one affects the probability of the occurrence of the other. CSA4050: Crash Concepts in Probability
Dependent Events A A B B sample space CSA4050: Crash Concepts in Probability
Conditional Probability • The conditional probability of an event A given that event B has already occurred is written p(A|B) • In general p(A|B) p(B|A) CSA4050: Crash Concepts in Probability
Dependent Events: p(A|B)≠ p(B|A) sample space A A B B CSA4050: Crash Concepts in Probability
Example Dependencies • Consider fair die example with • A = outcome divisible by 2 • B = outcome divisible by 3 • C = outcome divisible by 4 • p(A|B) = p(A B)/p(B) = (1/6)/(1/3) = ½ • p(A|C) = p(A C)/p(C) = (1/6)/(1/6) = 1 CSA4050: Crash Concepts in Probability
Conditional Probability • Intuitively, after B has occurred, event A is replaced by A B, the sample space Ω is replaced by B, and probabilities are renormalised accordingly • The conditional probability of an event A given that B has occurred (p(B)>0) is thus given by p(A|B) = p(A B)/p(B). • If A and B are independent,p(A B) = p(A) · p(B) sop(A|B) = p(A) · p(B) /p(B) = p(A). CSA4050: Crash Concepts in Probability
Bayesian Inversion • For A and B to occur, either B must occur first, then B, or vice versa. We get the following possibilites: p(A|B) = p(A B)/p(B)p(B|A) = p(A B)/p(A) • Hence p(A|B) p(B) = p(B|A) p(A) • We can thus express p(A|B) in terms of p(B|A) • p(A|B) = p(B|A) p(A)/p(B) • This equivalence, known as Bayes’ Theorem, is useful when one or other quantity is difficult to determine CSA4050: Crash Concepts in Probability
Bayes’ Theorem • p(B|A) = p(BA)/p(A) = p(A|B) p(B)/p(A) • The denominator p(A) can be ignored if we are only interested in which event out of some set is most likely. • Typically we are interested in the value of B that maximises an observation A, i.e. • arg maxB p(A|B) p(B)/p(A) = arg maxB p(A|B) p(B) CSA4050: Crash Concepts in Probability
The Chain Rule • We can use the definition of conditional probability to more than two events • p(A1 ... An) = p(A1) * p(A2|A1) * p(A3|A1 A2)..., p(An|A1 ... An-1) • The chain rule allows us to talk about the probability of sequences of events p(A1,...,An). CSA4050: Crash Concepts in Probability