230 likes | 317 Views
Statistics 111 - Lecture 9. Introduction to Inference. Sampling Distributions for Counts and Proportions. Administrative Notes. Homework 3 is due on Monday, June 15 th Covers chapters 1-5 in textbook Exam on Monday, June 15 th Review session on Thursday. Last Class.
E N D
Statistics 111 - Lecture 9 Introduction to Inference Sampling Distributions for Counts and Proportions Stat 111 - Lecture 9 - Proportions
Administrative Notes • Homework 3 is due on Monday, June 15th • Covers chapters 1-5 in textbook • Exam on Monday, June 15th • Review session on Thursday Stat 111 - Lecture 9 - Proportions
Last Class • Focused on models for continuous data: using the sample mean as our estimate of population mean • Sampling Distributionof the Sample Mean • how does the sample mean change over different samples? Sample 1 of size n x Sample 2 of size n x Sample 3 of size n x Sample 4 of size n x Sample 5 of size n x Sample 6 of size n x . . . Distribution of these values? Population Parameter: Stat 111 - Lecture 9 - Proportions
Today’s Class • We will now focus on count data: categorical data that takes on only two different values “Success” (Yi = 1) or “Failure” (Yi = 0) • Goal is to estimate population proportion: p = proportion of Yi = 1 in population Stat 111 - Lecture 9 - Proportions
Examples • Gender: our class has 83 women and 42 men • What is proportion of women in Penn student population? • Presidential Election: out of 2000 people sampled, 1150 will vote for McCain in upcoming election • What proportion of total population will vote for McCain? • Quality Control: Inspection of a sample of 100 microchips from a large shipment shows 10 failures • What is proportion of failures in all shipments? Stat 111 - Lecture 9 - Proportions
Inference for Count Data • Goal for count data is to estimate the population proportion p • From a sample of size n, we can calculate two statistics: 1. sample count Y 2. sample proportion = Y/n • Use sample proportion as our estimate of population proportionp • Sampling Distributionof the Sample Proportion • how does sample proportion change over different samples? Sample 1 of size n x Sample 2 of size n x Sample 3 of size n x Sample 4 of size n x Sample 5 of size n x Sample 6 of size n x . . . Distribution of these values? Population Parameter: p Stat 111 - Lecture 9 - Proportions
The Binomial Setting for Count Data • Fixed number n of observations (or trials) • Each observation is independent • Each observation falls into 1 of 2 categories: • Success (Y = 1) or Failure (Y = 0) • Each observation has the same probability of success: p = P(Y = 1) Stat 111 - Lecture 9 - Proportions
Binomial Distribution for Sample Count • Sample count Y (number of Yi=1 in sample of size n) has a Binomial distribution • The binomial distribution has two parameters: • number of trials n and population proportion p P(X=k) = nCk * pk (1-p)(n-k) • Binomial formula accounts for • number of success: pk • number of failures : (1-p)n-k • different orders of success/failures: nCk = n!/(k!(n-k)!) Stat 111 - Lecture 9 - Proportions
Binomial Probability Histogram • Can make histogram out of these probabilities • Can add up bars of histogram to get any probability we want: eg. P(Y < 4) • Different values of n and p have different histograms, but Table C in book has probabilities for many values of n and p Stat 111 - Lecture 9 - Proportions
Binomial Table Stat 111 - Lecture 9 - Proportions
Example: Genetics • If a couple are both carriers of a certain disease, then their children each have probability 0.25 of being born with disease • Suppose that the couple has 4 children • P(none of their children have the disease)? P(X=0) = 4!/(0!*4!) * .250 * (1-.25)4 • P(at least two children have the disease)? P(Y ≥ 2) = P(Y = 2) +P(Y = 3) +P(Y = 4) = 0.2109 +0.0469 +0.0039 (from table) = 0.2617 Stat 111 - Lecture 9 - Proportions
Example: Quality Control • A worker inspects a sample of n=20 microchips from a large shipment • The probability of a microchip being faulty is 10% (p = 0.10) • What is the probability that there are less than three failures in the sample? P(Y < 3) = P(Y = 0) + P(Y =1) + P(Y = 2) = 0.1216 + 0.2702 + 0.2852 (from table) = 0.677 Stat 111 - Lecture 9 - Proportions
Sample Proportions • Usually, we are more interested in a sample proportion = Y/n instead of a sample count P ( < k ) = P( Y < n*k) • Example: a worker inspects a sample of 20 microchips from a large shipment with probability of a microchip being faulty is 0.1 • What is the probability that our sample proportion of faulty chips is less than 0.05? • P ( < .05 ) = P( Y < 1) = P(Y=0) = .1216 0.05 x 20 Stat 111 - Lecture 9 - Proportions
Mean and Variance of Binomial Counts • If our sample count Y is a random variable with a Binomial distribution, what is the mean and variance of Y across all samples? • Useful since we only observe the value of Y for our sample but what are the values in other samples? • We can calculate the mean and variance of a Binomial distribution with parameters n and p: μY = n*p σ2 = n*p*(1-p) σ = √ (n*p*(1-p)) Stat 111 - Lecture 9 - Proportions
Mean/Variance of Binomial Proportions • Sample proportion is a linear transformation of the sample count ( = Y/n ) μ = 1/n * mean(Y) = 1/n * np = p • Mean of sample proportion is true probability of success p σ2 = 1/n2 Var(Y) = 1/n2 * n*p*(1-p) = p(1-p)/n • Variance of sample proportion decreases as sample size n increases! Stat 111 - Lecture 9 - Proportions
Variance over Long-Run • Lower variance with larger sample size means that sample proportion will tend to be closer to population mean in larger samples • Long-run behaviour of two different coin tossing runs. Much less likely to get unexpected events in larger samples Stat 111 - Lecture 9 - Proportions
Binomial Probabilities in Large Samples • In large samples, it is often tedious to calculate probabilities using the binomial distribution • Example: Gallup poll for presidential election • Bush has 49% of vote in population. What is the probability that Bush gets a count over 550 in a sample of 1000 people? P(Y > 550) = P(Y = 551) + P(Y = 552) + … + P(Y =1000) = 450 terms to look up in the table! • We can instead use the fact that for large samples, the Binomial distribution is closely approximated by the Normal distribution Stat 111 - Lecture 9 - Proportions
Normal Approximation to Binomial • If count Y follows a binomial distribution with parameters n and p, then Y approximately follows a Normal distribution with mean and variance: μY = n*p • This approximation is only good if n is “large enough”. • Rule of thumb for “large enough”:n·p≥ 10 and n(1-p) ≥ 10 • Also works for sample proportion: = Y/n follows a Normal distribution with mean and variance Stat 111 - Lecture 9 - Proportions
Example: Quality Control • Sample of 100 microchips (with usual 10% of microchips are faulty. What is the probability there are at least 17 bad chips in our sample? • Using Binomial calculation/table is tedious. Instead use Normal approximation: • Mean = n·p = 1000.10 = 10 • Var = n·p·(1-p) = 1000.100.90 = 9 = P(Z ≥ 2.33) =1- P(Z ≤ 2.33) = 0.01 (from table) Stat 111 - Lecture 9 - Proportions
Example: Gallup Poll • Bush has 49% of vote in population • What is the probability that Bush gets sample proportion over 0.51 in sample of size 1000? • Use normal distribution with mean = p = 0.49 and variance p·(1-p)/n = 0.000245 = P(Z ≥1.27) =1- P(Z ≤1.27) = 0.102 Stat 111 - Lecture 9 - Proportions
Why does Normal Approximation work? • Central Limit Theorem: in large samples, the distribution of the sample mean is approx. Normal • Well, our count data takes on two different values: “Success” (Yi = 1) or “Failure” (Yi = 0) • The sample proportion is the same as the sample mean for count data! • So, Central Limit Theorem works for sample proportions as well! Stat 111 - Lecture 9 - Proportions
Next Class - Lecture 10 • Review session on Wednesday/Thursday • Show up with questions! Stat 111 - Lecture 9 - Proportions