600 likes | 621 Views
Explore population sampling, distribution functions, parameters, and estimators to draw meaningful conclusions from samples. Learn the importance of random sampling and statistics in drawing accurate inferences about populations.
E N D
45-733: lecture 7 (chapter 6) Sampling Distributions William B. Vogt, Carnegie Mellon, 45-733
Samples from populations • There is some population we are interested in: • Families in the US • Products coming off our assembly line • Consumers in our product’s market segment • Employees William B. Vogt, Carnegie Mellon, 45-733
Samples from populations • We are interested in some quantitative information (called variables) about these populations: • Income of families in the US • Defects in products coming off our assembly line • Perception of consumers of our product • Productivity of our employees William B. Vogt, Carnegie Mellon, 45-733
Samples from populations • All the information (accessible to statistics) about a quantity in a population is contained in its distribution function • Real-world distribution functions are complicated things • In real life, we usually know little or nothing about the distribution functions of the variables we are interested in William B. Vogt, Carnegie Mellon, 45-733
Samples from populations • Because distribution functions are complex, we only try to find out about certain aspects of them (parameters): • Average income of families in the US • Rate of defects coming off our production line • % of customers who view our product favorably • Average pieces/hour finished by a worker William B. Vogt, Carnegie Mellon, 45-733
Samples from populations • Of course, we do not begin by knowing even these quantities • One possibility is to measure the whole population • Allows us to answer any question about the distribution or parameters, using the techniques of chapter 2 • However, this is almost always expensive and often infeasible William B. Vogt, Carnegie Mellon, 45-733
Samples from populations • Instead, we take a sample • Taking a sample • We select only a few of the members of the population • We measure the variables of interest for those members we select • Examples • Phone survey • Take 1 out of each 10,000 units off our prod line William B. Vogt, Carnegie Mellon, 45-733
Samples from populations • The whole of statistics is figuring out what we can learn about the population from a sample: • What can we say about the distribution of a variable from the information in a sample? • What can we say about the parameters we are interested in from our sample? • How good is the information in our sample about the population? William B. Vogt, Carnegie Mellon, 45-733
Samples from populations • Example: • We are interested in how favorably our product is viewed by customers • We do a phone survey of our 5 good friends and ask them if they view our product favorably or unfavorably • All 5 say favorably • What can we conclude? William B. Vogt, Carnegie Mellon, 45-733
Samples from populations • Example: • We are interested in how favorably our product is viewed by customers • We do a phone survey of 500 people who have purchased our product before and ask them if they view our product favorably or unfavorably • 466 say they view our product favorably • What can we conclude? William B. Vogt, Carnegie Mellon, 45-733
Samples from populations • Example: • We are interested in how favorably our product is viewed by customers • We do a phone survey of 500 random adults and ask them if they view our product favorably or unfavorably • 351 say they view our product favorably • What can we conclude? William B. Vogt, Carnegie Mellon, 45-733
Samples and statistics • As a practical matter, we are usually interested in using our sample to say something about a parameter of the distribution we care about • To get at this parameter, we construct a variable called an estimator or statistic William B. Vogt, Carnegie Mellon, 45-733
Samples and statistics • Example: • If we want to know the average income of families in the US, we draw a sample from a random phone survey of 1000 families • We ask, among other things, for their family income • To estimate E(I), we calculate the estimator or statistic called sample mean: William B. Vogt, Carnegie Mellon, 45-733
Samples and statistics • Example: • But, what does the sample mean of income tell us about E(I)? • Answering this question is the subject of the rest of the course, and of statistics in general William B. Vogt, Carnegie Mellon, 45-733
Random sampling • There are different ways to sample a population, different sampling schemes • The simplest sampling scheme is called “simple random sampling” or just “random sampling” • If there is a population of size N from which we are to draw a sample of size n, random sampling just says that the probability of any one of the N members of the population being drawn is just 1/N, and that the draws are independent. William B. Vogt, Carnegie Mellon, 45-733
Statistic or estimator • A statistic (or estimator) is any function of a sample • It is an algorithm which tells us what we would do given a sample • Example: • Sample mean: • Sample variance: William B. Vogt, Carnegie Mellon, 45-733
Statistic as random variable • A statistic is a random variable!! • A statistic is a random variable!! • A statistic is a random variable!! • A statistic is a random variable!! • A statistic is a random variable!! • A statistic is a random variable!! • A statistic is a random variable!! • A statistic is a random variable!! William B. Vogt, Carnegie Mellon, 45-733
Statistic as random variable • A simple example • Consider the Bernoulli random variable X with parameter p • We are interested in p, the probability of a success • To estimate p, we will calculate the sample mean of X: William B. Vogt, Carnegie Mellon, 45-733
Statistic as random variable • A simple example • First, with a sample size of n=1: William B. Vogt, Carnegie Mellon, 45-733
Statistic as random variable • A simple example • Next, with a sample size of n=2: William B. Vogt, Carnegie Mellon, 45-733
Statistic as random variable • A simple example • Next, with a sample size of n=3: William B. Vogt, Carnegie Mellon, 45-733
Statistic as random variable • The statistic is a random variable • It has a distribution • Probability function or density • Cumulative distribution function • It has an expectation • It has a variance / standard deviation William B. Vogt, Carnegie Mellon, 45-733
Statistic as random variable • For the Bernoulli example • Expectation, variance with n=1 William B. Vogt, Carnegie Mellon, 45-733
Statistic as random variable • For the Bernoulli example • Expectation, variance with n=2 William B. Vogt, Carnegie Mellon, 45-733
Statistic as random variable • For the Bernoulli example • Expectation, variance with n=3 William B. Vogt, Carnegie Mellon, 45-733
Statistic as random variable • For the Bernoulli example • Probability function, n=1 p 1-p 0 p 1 William B. Vogt, Carnegie Mellon, 45-733
Statistic as random variable • For the Bernoulli example • Probability function, n=2 0 p 1/2 1 William B. Vogt, Carnegie Mellon, 45-733
Statistic as random variable • For the Bernoulli example • Probability function, n=3 0 1/3 2/3 1 p William B. Vogt, Carnegie Mellon, 45-733
Sample mean • As we have discussed before, the sample mean of a random variable X from a sample of size n is: William B. Vogt, Carnegie Mellon, 45-733
Sample mean • The sample mean is a random variable!! • Sample mean is made out of n random variables; therefore, it is a random variable: William B. Vogt, Carnegie Mellon, 45-733
Sample mean • Let’s suppose X is a random variable with mean X and standard deviation X, and let’s consider the sample mean: William B. Vogt, Carnegie Mellon, 45-733
Sample mean • Since the sample mean is a random variable, we can ask about its expectation: William B. Vogt, Carnegie Mellon, 45-733
Sample mean • Since the sample mean is a random variable, we can ask about its expectation: William B. Vogt, Carnegie Mellon, 45-733
Sample mean • The expectation of the sample mean is equal to the expectation of the underlying random variable • On average, the sample mean is equal to the underlying random variable William B. Vogt, Carnegie Mellon, 45-733
Sample mean • We can also ask about the variance of the sample mean: William B. Vogt, Carnegie Mellon, 45-733
Sample mean • If it is an independent, random sample then the covariances are all zero: William B. Vogt, Carnegie Mellon, 45-733
Sample mean • The variance of the sample mean is less than the variance of the underlying random variable • The variance of the sample mean gets smaller as the sample size increases • The variance of the sample mean goes to zero as the sample size goes to infinity William B. Vogt, Carnegie Mellon, 45-733
Sample mean • Our two results: William B. Vogt, Carnegie Mellon, 45-733
Sample mean • Say that: • On average, the sample mean is equal to the mean of the underlying random variable, regardless of sample size • As the sample size grows, the variance of the sample mean shrinks, eventually approaching zero William B. Vogt, Carnegie Mellon, 45-733
Sample mean • What would happen if the sample size “got to” infinity? • Then the sample mean would no longer be a random variable, it would literally equal the population mean, E(X): William B. Vogt, Carnegie Mellon, 45-733
Sample mean • Suppose X~N(1,1). n=100 n=1 William B. Vogt, Carnegie Mellon, 45-733
Sample mean • Suppose X~N(1,1). n=1000 n=100 n=1 William B. Vogt, Carnegie Mellon, 45-733
Sample mean • Finite sample correction • What has gone before has assumed either that you sample with replacement or that the population you are sampling from is very large (infinite) • Just as we needed to use hypergeometric rather than binomial when sampling from a small pop without replacement, so here: William B. Vogt, Carnegie Mellon, 45-733
Sample mean • Finite sample correction • For a population of size N, sampled without replacement by a sample of size n: William B. Vogt, Carnegie Mellon, 45-733
Sample mean • Normal variables and • If X is normal, then so is X-bar • If X is normal, then: William B. Vogt, Carnegie Mellon, 45-733
Sample mean • Central limit theorem and: • As long as X comes from an independent random sample: William B. Vogt, Carnegie Mellon, 45-733
Sample proportion • Consider W a Bernoulli and an independent random sample of size n • Observe that X= W1+ W2+…+ Wn is distributed Binomial (and therefore approx normal) William B. Vogt, Carnegie Mellon, 45-733
Sample proportion • The sample mean (I.e. sample proportion) is: • Just a binomial divided by n • Also approx normal William B. Vogt, Carnegie Mellon, 45-733
Sample proportion • To emphasize that we are estimating the p parameter of the Bernoulli, we may write: William B. Vogt, Carnegie Mellon, 45-733
Sample proportion • Just as before, the sample mean has the same expectation as the underlying Bernoulli random variable: William B. Vogt, Carnegie Mellon, 45-733