620 likes | 1.08k Views
45-733: lecture 6 (chapter 5). Continuous Random Variables. Joint continuous distributions. The joint continuous distribution is a complete probabilistic description of a group of r.v.s Describes each r.v. Describes the relationship among r.v.s. Joint continuous distributions.
E N D
45-733: lecture 6 (chapter 5) Continuous Random Variables William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • The joint continuous distribution is a complete probabilistic description of a group of r.v.s • Describes each r.v. • Describes the relationship among r.v.s William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Cumulative distribution function • The cdf of a group of r.v.s describes the probability that each of them will be less than specified values, simultaneously: William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Cumulative distribution function • Graphing joint cdfs is difficult, because such a graph requires at least 3 dimensions • I will instead try to communicate joint distribution information through the use of scatterplots William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Cumulative distribution function • Scatterplots • No relationship X and Y Y X William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Cumulative distribution function • Scatterplots • A relationship between X and Y Y X William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Cumulative distribution function • Scatterplots • A relationship between X and Y Y X William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Cumulative distribution function • Scatterplots • A relationship between X and Y Y X William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Marginal distribution function • The marginal distribution function of each of a group of r.v.s is just its cdf: William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Independence • A group of continuous random variables is independent if their joint cdf factors into the individual cdfs: William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Covariance • Covariance measures the (linear) association between two random variables, just as in the discrete case. • Covariance is the expectation of the product of each variable’s deviation from its mean: William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Covariance • Covariance is positive when the two variables tend to be above their means together and below their means together • Covariance is negative when one variable tends to be above its mean when the other is below. William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Covariance • A positive covariance between X,Y Y X William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Covariance • A negative covariance between X,Y Y X William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Covariance • A zero covariance between X,Y • X,Y are independent Y X William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Covariance • A zero covariance between X,Y • X,Y are not independent Y X William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Covariance • A zero covariance between X,Y • X,Y are not independent Y X William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Covariance • A zero covariance between X,Y • X,Y are not independent Y X William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Covariance • As before, if X,Y are independent, then Cov(X,Y)=0 • As before, it is not the case that a zero covariance between X,Y implies that X and Y are independent William B. Vogt, Carnegie Mellon, 45-733
Joint continuous distributions • Sums: • The rules for sums of continuous rvs and expectation, variance are the same as for discrete: William B. Vogt, Carnegie Mellon, 45-733
Joint Distributions • Sums and differences • The expectation of sums is the sum of expectations William B. Vogt, Carnegie Mellon, 45-733
Joint Distributions • Sums and differences • The variance of a sum is the sum of variances, plus twice the covariance William B. Vogt, Carnegie Mellon, 45-733
Joint Distributions • Sums and differences • If two variables are uncorrelated (covariance is 0), the variance of a sum is the sum of variances William B. Vogt, Carnegie Mellon, 45-733
Joint Distributions • Sums and differences • The variance of a sum is the sum of variances plus twice the sum of every possible covariance: William B. Vogt, Carnegie Mellon, 45-733
Joint Distributions • Sums and differences • The variance of a sum is the sum of variances plus twice the sum of every possible covariance: William B. Vogt, Carnegie Mellon, 45-733
Joint Distributions • Sums and differences • The covariance of a sum with a third variable is the sum of covariances: William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • The normal distribution is the most important distribution in statistics • Many interesting and useful random variables follow the normal distribution • Many interesting and useful random variables do not follow the normal distribution but may be approximated by it William B. Vogt, Carnegie Mellon, 45-733
An aside: parameters • Recall that the distribution of a random variable contains all the information which statistics can discover about it • Distribution may be expressed in a probability function or a cumulative distribution function for a discrete random variable • Distribution may be expressed in a density function or a cumulative distribution function for a continuous random variable William B. Vogt, Carnegie Mellon, 45-733
An aside: parameters • Often, the information in a distribution is too much to process easily • When this is true, we want summary measures of the information: • Mean • Variance • Median • Percentiles William B. Vogt, Carnegie Mellon, 45-733
An aside: parameters • These summary measures are often called parameters of the distribution • Mean is often written when we are thinking of it as a parameter • Standard deviation is often written when we are thinking of it as a parameter William B. Vogt, Carnegie Mellon, 45-733
An aside: parameters • Also, recall that probability functions, density functions, cdfs often depend on unspecified numbers • These numbers are also called parameters William B. Vogt, Carnegie Mellon, 45-733
An aside: parameters William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • Density function • Two parameters, and • The formula for the density function: William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • Density function • Two parameters, and • It turns out that if X is distributed normal: William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • Density function for =0, =1 William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • Density functions for =1 =2 =5 William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • Density functions for =1 =2 =1 William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • Standard normal • The normal distribution with mean zero and variance one has a special name, the standard normal William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • Cumulative distribution function • There is no nice formula for it • The standard normal cdf is compiled in tables • We write the standard normal cdf: William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • The normal distribution is symmetric -a 0 a William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • The normal table • Contains tabulations of the standard normal cdf. • Different tables tabulate the cdf differently William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • The normal table • Our book’s normal table provides literally the cdf, (z)=P(Zz): 0 z William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • The normal table • Another common table type: 0 z William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • Using the normal table • If you want to know the probability that Z, distributed N(0,1) is less than some value, just look up the value in the table • P(Z1)=0.8413 • P(Z2.03)=0.9788 • P(Z-2)=?? William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • Using the normal table • Our table has no negative numbers! • P(Z-2)=?? • P(Z-2)=P(Z>2)=1-P(Z -2) • P(Z-2)=1-0.9772 • P(Z-2)=0.0228 -2 0 2 William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • Using the normal table • How about: • P(Z>2)=1-P(Z -2) • P(Z>2)=1-0.9772 • P(Z>2)=0.0228 -2 0 2 William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • Using the normal table • Ranges • P(1Z2)= P(Z2)- P(Z1) • P(1Z2)= 0.9772-0.8413 • P(1Z2)= 0.1359 0 2 William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • Using the normal table • What if X is distributed N(1,4) and you need to know P(X<5)? • Can only use the table for random variables distributed N(0,1) William B. Vogt, Carnegie Mellon, 45-733
Normal distribution • Using the normal table • Must “standardize” X: William B. Vogt, Carnegie Mellon, 45-733
Central limit theorem • The central limit theorem says (in essence) that the standardized average of a bunch of random variables tends to be distributed normal as the number of random variables grows large. William B. Vogt, Carnegie Mellon, 45-733