270 likes | 568 Views
CS145: Probability & Computing Lecture 10: Continuous Bayes’ Rule, Joint and Marginal Probability Densities. Instructor: Eli Upfal Brown University Computer Science. Figure credits: Bertsekas & Tsitsiklis , Introduction to Probability , 2008 Pitman, Probability , 1999. Midterm Exam.
E N D
CS145: Probability & ComputingLecture 10: Continuous Bayes’ Rule,Joint and Marginal Probability Densities Instructor: Eli Upfal Brown University Computer Science Figure credits:Bertsekas & Tsitsiklis, Introduction to Probability, 2008Pitman, Probability, 1999
Midterm Exam During regular class time - Thursday, October 24 • Material: Up to and including the lecture today.Format: Hand-written answers to probability problems.Similar to “medium” difficulty problems from homeworks.No programming, calculators not needed or allowed. • Formula Sheet: We will distribute in the exam – copy on the course web site.Included: Formulas for the pmf/pdf/cdf of particular distributions, mean/variance of distributions, integral and derivative identities, etc.Not included: Conceptual equations. Joint & marginal distributions, Bayes’ rule, checking independence, computing cdf from pdf, etc.
Classification from Continuous Data X=1 fire, X=0 no fire F_Y|X (y~|~x=0) = distribution of smoke (carbon monoxide gas) level when there is no fire. F_Y|X (y~|~x=1) = distribution of smoke (carbon monoxide gas) level when there is fire. Given that $Y=y$, is there a fire? How do we adapt Bayes’ Rule to continuous distributions?
CS145: Lecture 10 Outline • Bayes’ Rule: Classification from Continuous Data • Joint and Marginal Probability Densities
Continuous Random Variables • For any discrete random variable, the CDF is discontinuous and piecewise constant • If the CDF is monotonically increasing and continuous, have a continuous random variable: 1 • The probability that continuous random variable X lies in the interval (x1,x2] is then 0
Probability Density Function (PDF) • If the CDF is differentiable, its first derivative is called the probability density function (PDF): 1 0 • By the fundamental theorem of calculus: • For any valid PDF: 0
Classification Problems Athitsos et al., CVPR 2004 & PAMI 2008 • Which of the 10 digits did a person write by hand? • Which of the basic ASL phonemes did a person sign? • Is an email spam or not spam (ham)? • Is this image taken in an indoor our outdoor environment? • Is a pedestrian visible from a self-driving car’s camera? • What language is a webpage or document written in? • How many stars would a user rate a movie that they’ve never seen?
Continuous & Discrete Variables Example: Probability of each category in a classification problem. Example: Output of temperature sensor, motion sensor, camera, etc. For each possible X=x, the distribution of this sensor is different.
Marginal Distributions are Mixtures Marginal Distribution: Summing over all possible values of X, the marginal CDF (PDF) is a mixtureof the conditional CDFs (PDFs): • Then taking derivative of each term in sum:
Example: Hard Drive Lifetimes Exponential Distributions: • Suppose 90% of hard drives in some laptop computer model have exponentially distributed lifetime param • However, 10% of hard drives have a manufacturing defect that gives them a shorter lifetime • Recall mean of exponential distribution:
Example: Hard Drive Lifetimes Exponential Distributions: • Suppose 90% of hard drives in some laptop computer model have exponentially distributed lifetime param • However, 10% of hard drives have a manufacturing defect that gives them a shorter lifetime • If your hard drive has operated for t seconds and has not yet failed, what is the probability it is defective?
Example: Hard Drive Lifetimes Exponential Distributions: • Suppose 90% of hard drives in some laptop computer model have exponentially distributed lifetime param • However, 10% of hard drives have a manufacturing defect that gives them a shorter lifetime • If your hard drive fails after exactly t seconds of operation, what is the probability it is defective? ? Problem: For continuous variable Y,
Discrete Inference from Continuous Data • Y has a different PDF for each possible discrete X=x • Given Y=y, we want to find the probability of each possible X=x • But how can we condition on an event of probability zero? l'Hopital's Rule
Example: Hard Drive Lifetimes Exponential Distributions: • Suppose 90% of hard drives in some laptop computer model have exponentially distributed lifetime param • However, 10% of hard drives have a manufacturing defect that gives them a shorter lifetime • If your hard drive fails after exactly t seconds of operation, what is the probability it is defective?
Classification from Continuous Data Suppose X is discrete and Y is continuous, and we observe Y=y. Priorprobability mass function for X: Conditionalprobability density function for Y: Posteriorprobability mass function of X given Y=y:
Moments of Continuous Data Suppose X is discrete and Y is continuous, and we observe Y=y. Priorprobability mass function for X: Conditionalprobability density function for Y: Marginal expected value of random variable Y:
Mixed Continuous/Discrete Distributions • Representations of distribution of X: • CDF is (always) well defined • Formally, PDF does not exist in this case • Informally, can illustrate via a hybrid of aprobability density function (PDF) and aprobability mass function (PMF) • Related concepts in physics & engineering:Dirac delta function, impulse response • Distribution of random variable X: • With probability 0.5, X has acontinuous uniform distribution on [0,1] • With probability 0.5, X=0.5
CS145: Lecture 10 Outline • Bayes’ Rule: Classification from Continuous Data • Joint and Marginal Probability Densities
Example: Joint Distribution Pitman’s Probability, 1999
Marginal Distributions Example: Uniform Distributions
Classification Problems Athitsos et al., CVPR 2004 & PAMI 2008 • Which of the 10 digits did a person write by hand? • Which of the basic ASL phonemes did a person sign? • Is this image taken in an indoor our outdoor environment? • Is a pedestrian visible from a self-driving car’s camera? If raw data is a list of M continuous features (like pixel intensities),scale models to high-dimensional data via independence assumptions: