1 / 26

Probability (Ch. 13. Uncertainty)

Probability (Ch. 13. Uncertainty). Sept. 6, 2004 Jahwan Kim AIPR Lab Div. of CS, KAIST. Legend. red words keywords gray words Technically more advanced stuff. Something you don’t have to pay attention, unless you know its meaning already. This doesn’t mean they are less important.

guang
Download Presentation

Probability (Ch. 13. Uncertainty)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probability(Ch. 13. Uncertainty) Sept. 6, 2004 Jahwan Kim AIPR Lab Div. of CS, KAIST

  2. Legend red words keywords gray words Technically more advanced stuff. Something you don’t have to pay attention, unless you know its meaning already. This doesn’t mean they are less important. Jahwan Kim, Probability

  3. Contents We will review the basics of probability in detail. • Why Probability? • Uncertainty, Desiderata, Dutch Book Argument • Axioms of Probability • What’s wrong with uncountable sets? • For whom the probability tolls: sigma-algebra • Kolmogorov’s axioms • Description of Probability • Probability Distribution Function • Random Variables Jahwan Kim, Probability

  4. Uncertainty • Logic does not fit everywhere • Laziness: There may be too many cases and their causal relations. • Theoretical Ignorance: There may be no complete theory • Practical Ignorance: It may be practically impossible to obtain all the information necessary for the conclusion. • Example Tooth aches ) Cavities? Jahwan Kim, Probability

  5. Probability:Measuring Uncertainty • Example “There is an 80% chance” that the patient has a cavity if he/she has a toothache. • Want to: • represent the degree of our belief numerically, • and the rules to manipulate these numbers. • To be discerned with fuzzy logic, which measures degree of truth Jahwan Kim, Probability

  6. Probability: Desiderata • We’d like to assign a number b(x) to a proposition x, so that • 0· b(x) · 1 (normalized value) • b(x)=0 when x is definitely not trueb(x)=1 when x is definitely true • b(x|y) represents the degree of belief (plausibility) that x is true when we know y is true. Jahwan Kim, Probability

  7. Probability: Desiderata • Cox Axioms • Degrees of belief are represented by real numbers • Qualitative correspondence with common sense • Consistency • If a conclusion can be reasoned in more than one ways, then every way should lead to the same answer. • All relevant evidence must be considered. • Equivalent state of knowledge are represented by equivalent plausibility assignment. • Conclusion: The belief function b(x) must satisfy the axioms of probability. (Dutch Book argument) Jahwan Kim, Probability

  8. Dutch Book Argument • Suppose you’re willing to accept bets with odd proportional to b(x), that is, when b(x)=0.9, you accept the bet • Win ¸ $1 if x is true, • Lose $9 if x is false. • If your belief function b(x) does not satisfy the axioms of probability, there exists a set or simultaneous bets (called a Dutch Book) which you accept, and for which you’re bound to lose money, regardless of outcome. • The only way to guard against Dutch Books is to ensure b(x) to follow the axioms of probability. Jahwan Kim, Probability

  9. Comments • This part was reproduction of Ghahramani’s Tutorial at ICML 2004 • For philosophical insight, see Jaynes, Probability Theory: The Logic of Science (available on the web) Jahwan Kim, Probability

  10. Axioms of Probability • As did many great scientific discoveries, probability theory did not first started in axiomatic forms. • Kolmogorov first axiomatized probability theory, using set theory and measure theory. • Sometimes this Kolmogorov’s axioms are met with some criticism, being too abstract. • We will not go into technical details but look into in details the axioms and why such axioms are necessary. Jahwan Kim, Probability

  11. Ingredients of Probability • Probability is defined on events. • Sometimes events are explicitly described, as in: The coin toss is heads. I throw two 2’s with two dices. • Sometimes events are qualitatively stated, as in: A number randomly picked from [0,1] is a rational number. • In all cases, events must be concretely specified. • For rigorous approach, events are just sets. Jahwan Kim, Probability

  12. Ingredients of Probability • We usually restrict ourselves to a certain collection of events. So we assume • There is a set S (called the sample space), so that • Any event A is a subset of S. • Example S={Heads, Tails} for tossing 1 coin S={1,2,3,4,5,6}2 for throwing 2 different dices S=[0,1] for randomly picking a real number from [0,1] • Question: Is every subset of S an event? • To be answered later Jahwan Kim, Probability

  13. Axioms of Probability, first try • So probability can be defined as a function P: • With values in [0,1] • There is a sample space S, so that P(S)=1 • P is defined on events, i.e., subsets of S. • What else? • For two mutually disjoint events A and B, i.e., events A and B with AÅB=, P(A[B)=P(A)+P(B). • This condition is called additivity. • Is this enough? Jahwan Kim, Probability

  14. Additivity: Finite vs. Infinite • The above axioms are enough, if we’re only dealing with finite sample spaces. • What about infinite unions? • P([ A)= P(A), when A’s are mutually disjoint? • It seems very reasonable to allow this. • It is also necessary to deal with limits and hence to utilize calculus. • Really? Jahwan Kim, Probability

  15. Additivity: Finite vs. Infinite • Example (Skipping some technical details) S=[0,1]. We want to have uniform probability P on S, so P({x})= for any x2[0,1]. But then, P([0,1]) = P([x2[0,1]{x}) = x2[0,1] P({x}) = x2[0,1] = 1 if >0, and 0 if =0. This is contradiction, and such P cannot exists. • Where did it go wrong? Jahwan Kim, Probability

  16. Additivity: Finite vs. Infinite • Requiring infinite additivity, unfortunately, turns out to be requiring too much. • This happens because, uncountable infinity have many bad (hard-to-understand) things with it. • However, we can enforce countable additivity: P([i=11 Ai)=i=11 P(Ai), when Ai’s are mutually disjoint. • Note that the left hand side is just a single number, while the right hand side is a limit. Jahwan Kim, Probability

  17. Axioms of Probability, second try • So probability can be defined as a function P: • With values in [0,1] • There is a sample space S, so that P(S)=1 • P is defined on events, i.e., subsets of S. • For any countable collection of mutually disjoint events Ai fori=1,2,…, P([i=11 Ai)=i=11 P(Ai) Jahwan Kim, Probability

  18. Which subsets are events? • Can any subset A of S be an event, i.e., can we define P(A) for all subset A of S? • For finite S, definitely yes. • For uncountable S… • Results from Measure Theory There is no useful ( translation-invariant) probability P defined for all subset of [0,1]. • This is one of many bad things about uncountable sets. • What can we do? Jahwan Kim, Probability

  19. Events must be specified. • Although it’d be a really wonderful world if we can give probabilities to all subsets of any S, it turns out to be impossible. • Instead we need to specify for which subsets the probability can be defined. • Consequences: Let  be the collection of all subsets of S for which P is defined, i.e., the set of all events. • If A,B2, then AC, A[B, AÅB2. More generally, for any countable collection Ai2, [i Ai2 and Åi Ai2. • S2, and 2. • Such a collection of subsets is called a sigma-algebra, on which a measure can be defined. Jahwan Kim, Probability

  20. More on Sigma-Algebra • For finite S,  is the set of all subsets, i.e., the power set, of S. • Most important probability space is the Euclidean space Rn, or some subset of it. • In such cases, we usually take  to be the Borel sigma-algebra. • This is to make connection with topology on S, i.e., to utilize continuity and calculus. • Borel sigma-algebra will contain almost all the subset of interest, in fact, any subset you can think of. • Practically speaking, we can safely forget about for which subsets P is defined: P is defined for all interesting sets! Jahwan Kim, Probability

  21. Axioms of Probability, finally • Probability is a function P defined on the collection  of events, which are subsets of the sample space S, • With values in [0,1] • P(S)=1 • For any countable collection of mutually disjoint events Ai fori=1,2,…, P([i=11 Ai)=i=11 P(Ai) • These are the Kolmogorov’s axioms of probability. Jahwan Kim, Probability

  22. How do we describe probability? • For finite S, prescribe P({x}) for all x2S. (Table) • f(x)=P({x}) is called the probability mass function. • For S=R? • For any A½R, define P(A)=length(A)=sx2Adx. Is this a probability? • Let f(x) be any nonnegative function defined on R such that sx2Rf(x)dx=1. Define P(A)=sx2Af(x)dx. Is this a probability? • Such f is called the probability distribution function (pdf for short). • Example: f(x)=exp(x2)/(2)½ the Gaussian or normal distribution. • Similar results hold for Rn. Jahwan Kim, Probability

  23. Probability Distribution Function • Ignoring a lot of, really a lot of technical details, any probability on R can be given as above. • Consequence: Let S be uncountable, e.g., [0,1] or R Suppose P has a pdf. Then for any x2S, P({x})=0. Thus P(A)=0 for any countable A½S. • Counter-Example S=R. Suppose P(A)=1 if 02A, P(A)=0 otherwise. Is this a probability? Can P have a pdf? Jahwan Kim, Probability

  24. Random Variables • Suppose a probability P is defined on S. • Suppose X is a nice function on S with values in R. • Again, “nice function” in practical terms is “any function” you can think of. • Then we may define a new probability PX on R as follows: PX(A)=P(X-1(A)), for any nice A½R • That is, any function X on a probability space defines a new probability PX, defined on R. • This function X is called a random variable. Jahwan Kim, Probability

  25. RandomVariables/Vectors/Process • In the above definition, we may use Rn instead of R. In such cases, X is called a random vector. • Random vectors may be considered as a collection of random variables. • Since PX is a probability on R or Rn, (most of the times) it will have the pdf, which is called the pdf of the random variable/vector X. • Sometimes it is also useful to consider X with values in R1. In such cases, X is called a random process. Jahwan Kim, Probability

  26. Questions • Let S=[0,1] and P be a probability on it. Can P({0})=1? • Is any pdf for a random variable necessarily continuous? Is it always differentiable? • Let f be a pdf for a random variable X. Must f satisfy 0·f(x)·1 for all x? • Let X be a random variable with mean 0 and variance 1. What is the probability P(X=1)? Jahwan Kim, Probability

More Related