280 likes | 454 Views
Probability (Ch. 13. Uncertainty). Sept. 6, 2004 Jahwan Kim AIPR Lab Div. of CS, KAIST. Legend. red words keywords gray words Technically more advanced stuff. Something you don’t have to pay attention, unless you know its meaning already. This doesn’t mean they are less important.
E N D
Probability(Ch. 13. Uncertainty) Sept. 6, 2004 Jahwan Kim AIPR Lab Div. of CS, KAIST
Legend red words keywords gray words Technically more advanced stuff. Something you don’t have to pay attention, unless you know its meaning already. This doesn’t mean they are less important. Jahwan Kim, Probability
Contents We will review the basics of probability in detail. • Why Probability? • Uncertainty, Desiderata, Dutch Book Argument • Axioms of Probability • What’s wrong with uncountable sets? • For whom the probability tolls: sigma-algebra • Kolmogorov’s axioms • Description of Probability • Probability Distribution Function • Random Variables Jahwan Kim, Probability
Uncertainty • Logic does not fit everywhere • Laziness: There may be too many cases and their causal relations. • Theoretical Ignorance: There may be no complete theory • Practical Ignorance: It may be practically impossible to obtain all the information necessary for the conclusion. • Example Tooth aches ) Cavities? Jahwan Kim, Probability
Probability:Measuring Uncertainty • Example “There is an 80% chance” that the patient has a cavity if he/she has a toothache. • Want to: • represent the degree of our belief numerically, • and the rules to manipulate these numbers. • To be discerned with fuzzy logic, which measures degree of truth Jahwan Kim, Probability
Probability: Desiderata • We’d like to assign a number b(x) to a proposition x, so that • 0· b(x) · 1 (normalized value) • b(x)=0 when x is definitely not trueb(x)=1 when x is definitely true • b(x|y) represents the degree of belief (plausibility) that x is true when we know y is true. Jahwan Kim, Probability
Probability: Desiderata • Cox Axioms • Degrees of belief are represented by real numbers • Qualitative correspondence with common sense • Consistency • If a conclusion can be reasoned in more than one ways, then every way should lead to the same answer. • All relevant evidence must be considered. • Equivalent state of knowledge are represented by equivalent plausibility assignment. • Conclusion: The belief function b(x) must satisfy the axioms of probability. (Dutch Book argument) Jahwan Kim, Probability
Dutch Book Argument • Suppose you’re willing to accept bets with odd proportional to b(x), that is, when b(x)=0.9, you accept the bet • Win ¸ $1 if x is true, • Lose $9 if x is false. • If your belief function b(x) does not satisfy the axioms of probability, there exists a set or simultaneous bets (called a Dutch Book) which you accept, and for which you’re bound to lose money, regardless of outcome. • The only way to guard against Dutch Books is to ensure b(x) to follow the axioms of probability. Jahwan Kim, Probability
Comments • This part was reproduction of Ghahramani’s Tutorial at ICML 2004 • For philosophical insight, see Jaynes, Probability Theory: The Logic of Science (available on the web) Jahwan Kim, Probability
Axioms of Probability • As did many great scientific discoveries, probability theory did not first started in axiomatic forms. • Kolmogorov first axiomatized probability theory, using set theory and measure theory. • Sometimes this Kolmogorov’s axioms are met with some criticism, being too abstract. • We will not go into technical details but look into in details the axioms and why such axioms are necessary. Jahwan Kim, Probability
Ingredients of Probability • Probability is defined on events. • Sometimes events are explicitly described, as in: The coin toss is heads. I throw two 2’s with two dices. • Sometimes events are qualitatively stated, as in: A number randomly picked from [0,1] is a rational number. • In all cases, events must be concretely specified. • For rigorous approach, events are just sets. Jahwan Kim, Probability
Ingredients of Probability • We usually restrict ourselves to a certain collection of events. So we assume • There is a set S (called the sample space), so that • Any event A is a subset of S. • Example S={Heads, Tails} for tossing 1 coin S={1,2,3,4,5,6}2 for throwing 2 different dices S=[0,1] for randomly picking a real number from [0,1] • Question: Is every subset of S an event? • To be answered later Jahwan Kim, Probability
Axioms of Probability, first try • So probability can be defined as a function P: • With values in [0,1] • There is a sample space S, so that P(S)=1 • P is defined on events, i.e., subsets of S. • What else? • For two mutually disjoint events A and B, i.e., events A and B with AÅB=, P(A[B)=P(A)+P(B). • This condition is called additivity. • Is this enough? Jahwan Kim, Probability
Additivity: Finite vs. Infinite • The above axioms are enough, if we’re only dealing with finite sample spaces. • What about infinite unions? • P([ A)= P(A), when A’s are mutually disjoint? • It seems very reasonable to allow this. • It is also necessary to deal with limits and hence to utilize calculus. • Really? Jahwan Kim, Probability
Additivity: Finite vs. Infinite • Example (Skipping some technical details) S=[0,1]. We want to have uniform probability P on S, so P({x})= for any x2[0,1]. But then, P([0,1]) = P([x2[0,1]{x}) = x2[0,1] P({x}) = x2[0,1] = 1 if >0, and 0 if =0. This is contradiction, and such P cannot exists. • Where did it go wrong? Jahwan Kim, Probability
Additivity: Finite vs. Infinite • Requiring infinite additivity, unfortunately, turns out to be requiring too much. • This happens because, uncountable infinity have many bad (hard-to-understand) things with it. • However, we can enforce countable additivity: P([i=11 Ai)=i=11 P(Ai), when Ai’s are mutually disjoint. • Note that the left hand side is just a single number, while the right hand side is a limit. Jahwan Kim, Probability
Axioms of Probability, second try • So probability can be defined as a function P: • With values in [0,1] • There is a sample space S, so that P(S)=1 • P is defined on events, i.e., subsets of S. • For any countable collection of mutually disjoint events Ai fori=1,2,…, P([i=11 Ai)=i=11 P(Ai) Jahwan Kim, Probability
Which subsets are events? • Can any subset A of S be an event, i.e., can we define P(A) for all subset A of S? • For finite S, definitely yes. • For uncountable S… • Results from Measure Theory There is no useful ( translation-invariant) probability P defined for all subset of [0,1]. • This is one of many bad things about uncountable sets. • What can we do? Jahwan Kim, Probability
Events must be specified. • Although it’d be a really wonderful world if we can give probabilities to all subsets of any S, it turns out to be impossible. • Instead we need to specify for which subsets the probability can be defined. • Consequences: Let be the collection of all subsets of S for which P is defined, i.e., the set of all events. • If A,B2, then AC, A[B, AÅB2. More generally, for any countable collection Ai2, [i Ai2 and Åi Ai2. • S2, and 2. • Such a collection of subsets is called a sigma-algebra, on which a measure can be defined. Jahwan Kim, Probability
More on Sigma-Algebra • For finite S, is the set of all subsets, i.e., the power set, of S. • Most important probability space is the Euclidean space Rn, or some subset of it. • In such cases, we usually take to be the Borel sigma-algebra. • This is to make connection with topology on S, i.e., to utilize continuity and calculus. • Borel sigma-algebra will contain almost all the subset of interest, in fact, any subset you can think of. • Practically speaking, we can safely forget about for which subsets P is defined: P is defined for all interesting sets! Jahwan Kim, Probability
Axioms of Probability, finally • Probability is a function P defined on the collection of events, which are subsets of the sample space S, • With values in [0,1] • P(S)=1 • For any countable collection of mutually disjoint events Ai fori=1,2,…, P([i=11 Ai)=i=11 P(Ai) • These are the Kolmogorov’s axioms of probability. Jahwan Kim, Probability
How do we describe probability? • For finite S, prescribe P({x}) for all x2S. (Table) • f(x)=P({x}) is called the probability mass function. • For S=R? • For any A½R, define P(A)=length(A)=sx2Adx. Is this a probability? • Let f(x) be any nonnegative function defined on R such that sx2Rf(x)dx=1. Define P(A)=sx2Af(x)dx. Is this a probability? • Such f is called the probability distribution function (pdf for short). • Example: f(x)=exp(x2)/(2)½ the Gaussian or normal distribution. • Similar results hold for Rn. Jahwan Kim, Probability
Probability Distribution Function • Ignoring a lot of, really a lot of technical details, any probability on R can be given as above. • Consequence: Let S be uncountable, e.g., [0,1] or R Suppose P has a pdf. Then for any x2S, P({x})=0. Thus P(A)=0 for any countable A½S. • Counter-Example S=R. Suppose P(A)=1 if 02A, P(A)=0 otherwise. Is this a probability? Can P have a pdf? Jahwan Kim, Probability
Random Variables • Suppose a probability P is defined on S. • Suppose X is a nice function on S with values in R. • Again, “nice function” in practical terms is “any function” you can think of. • Then we may define a new probability PX on R as follows: PX(A)=P(X-1(A)), for any nice A½R • That is, any function X on a probability space defines a new probability PX, defined on R. • This function X is called a random variable. Jahwan Kim, Probability
RandomVariables/Vectors/Process • In the above definition, we may use Rn instead of R. In such cases, X is called a random vector. • Random vectors may be considered as a collection of random variables. • Since PX is a probability on R or Rn, (most of the times) it will have the pdf, which is called the pdf of the random variable/vector X. • Sometimes it is also useful to consider X with values in R1. In such cases, X is called a random process. Jahwan Kim, Probability
Questions • Let S=[0,1] and P be a probability on it. Can P({0})=1? • Is any pdf for a random variable necessarily continuous? Is it always differentiable? • Let f be a pdf for a random variable X. Must f satisfy 0·f(x)·1 for all x? • Let X be a random variable with mean 0 and variance 1. What is the probability P(X=1)? Jahwan Kim, Probability