Estimation in Sampling

Estimation in Sampling GTECH 201 Lecture 15

Conceptual Setting • How do we come to conclusions from empirical evidence? • Isn’t common sense enough? • Why? • Systematic methods for drawing conclusions from data • Statistical inference • Inductive versus Deductive Reasoning

Drawing Conclusions • Statistical inference • Based on the laws of probability • What would happen if? • You ran your experiment hundreds of times • You repeated your survey over and over again • Statistic and Parameter • The proportion of the population who are <disabled> usually denoted by: p • In a SRS of 1000 people, the proportion of the people who are <disabled> usually denoted by: (p -hat)

Estimating with Confidence • Say you are conducting an opinion poll… • SRS of 1000 adult television viewers • You ask these folks if they trust Walter Cronkite when he delivers the nightly news • Out of 1000, 570 say, they trust him • 57% of the people trust Walter • is 0.57 • If you collect another set of 1000 television viewers, what will the rating be?

Confidence Statement • We need to add a confidence statement • We need to say something about the margin of error • Confidence statements are based on the distribution of the values of the sample proportion that would occur if many independent SRS were taken from the same population • The sampling distribution of the statistic

Terminology Review • Sample • Population • Statistic • a numerical characteristic associated with a sample • Parameter • A numerical characteristic associated with the population • Sampling error • The need for interval estimation

Point Estimation • Point estimation of a parameter is the value of a statistic that is used to estimate the parameter • Compute statistic (e.g., mean) • Use it to estimate corresponding population parameter • Point Estimators of Population Parameters(see next slide)

Point Estimators for Population Parameters Population Sample Calculating Parameter statistic formula

Interval Estimation • Sample point estimators are usually not absolutely precise • How close or how distant is the calculated sample statistic from the population parameter • We can say that the sample statistic is within a certain range or interval of the population parameter. • The determination of this range is the basis for interval estimation

Interval Estimation (2) • A confidence interval (CI) represents the level of precision associated with a population estimate • Width of the interval is determined by • Sample size, • variability of the population, and • the probability level or the level of confidence selected

Sampling Distributionof the Mean • The distribution of all possible sample means for a sample of a given size • Use the mean of a sample to estimate and draw conclusions about the mean of that entire population • So we have samples of a particular size • We need formulas to determine the mean and the standard deviation of all possible sample means for samples of a given size from a population

Sample and Population Mean • For samples of size n, mean of the variable • Is equal to the mean of the variable under consideration • Mean of all possible sample means is equal to the population mean

Sample Standard Deviation • For samples of size n, the standard deviation of the variable • Is equal to the standard deviation of the variable under consideration, divided by the square root of the sample size • For each sample size, the standard deviation of all possible sample means equals the population standard deviation divided by the square root of the sample size

Central Limit Theorem • Suppose all possible random samples of size n are drawn from an infinitely large, normally distributed population having a mean and a standard deviation • The frequency distribution of these sample means will have: • A mean of (the population mean) • A normal distribution around this population mean • A standard deviation of

Sampling Error • Standard Error of the mean (SEM) is a basic measure for the amount of sampling error • SEM indicates how much a typical sample mean is likely to differ from a true population mean • Sample size, and population standard deviation affect the sampling error

Sampling Error (2) • The larger the sample size, the smaller the amount of sampling error • The larger the standard deviation, the greater the amount of sampling error

Finite Population Correction Factor • The frequency distribution of the sample means is approximately normal if the sample size is large • N < 30 (small sample); N > 30 (large sample) • If you have a finite population, then you need to introduce a correction, i.e., the fpc rule/factor in the estimation process • where fpc = finite population correction; • n = sample size; • N = population size

Standard Error of the Mean for Finite Populations When including the fpc should be: In general, you include the fpc in the population estimates only when the ratio of sample size to population size exceeds 5 % or when n / N > 0.05

Constructing Confidence Intervals • A random sample of 50 commuters reveals that their average journey-to-work distance was 9.6 miles • A recent study has determined that the std. deviation of journey-to-work distance is approximately 3 miles • What is the CI around this sample mean of 9.6 that guarantees with 90 % certainty that the true population mean is enclosed within that interval?

Confidence Intervalfor the Mean • Z value associated with a 90 % confidence level (Z =1.65) • The sample mean is the best estimate of the true population mean • CI = • 9.6 +1.65 (3/ ) = 10.30 miles • 9.6 - 1.65 (3/ ) = 8.90 miles

Confidence Interval • We say that the sample statistic is within a certain range or interval of the population parameter • e.g., in our sample, 57% of the viewers thought Walter Cronkite is trustworthy • In the general population, between 54% and 60% of viewers think that Walter Cronkite is trustworthy • Or, in our sample, the average commuting distance was 9.6 miles • In the population, we calculated that the average commute is likely to be somewhere between 8.9 miles and 10.3 miles

Confidence Level • Gives you an understanding of how reliable your previous statement regarding the confidence interval is • The probability that the interval actually includes the population parameter • For example, the confidence level refers to the probability that the interval (8.9 miles to 10.3 miles) actually encompasses the TRUE population mean (90%, 95%, 99.7%) • Confidence Level probability is 1 - 

Significance Level •  (alpha) • The probability that the interval that surrounds the sample statistic DOES NOT include the population parameter • E.g., the probability that the average commuting distance does not fall between 8.9 miles and 10.3 miles •  = 0.10 (90%); 0.05 (95%); 0.01 (99.7%) • Confidence Interval width -- increases

Sampling Error • Total sampling error =  • Probability that the sample statistic will fall into either tail of the distribution is: /2 • If you want 99.7% confidence (i.e., low error), then you have to settle for giving a less precise estimate (the CI is wider)

If the Standard Deviationis Unknown • If we don’t know the population mean, its likely we don’t know the standard deviation • What you are likely to have is the variance and standard deviation of your sample • Also, you have a small population, so you have to use the finite population correction factor that was discussed earlier • Once you have the formula for standard error, then you can proceed as before to determine the confidence interval

Standard Error

Student’s T Distribution • William Gosset (1876-1937) • Published his contributions to statistical theory under a pseudonym • Student’s t distribution is used in performing inferences for a population mean, when, • The population being sampled is approximately normally distributed • The population standard deviation is unknown • And the sample size is small (n < 30)

Characteristics of the t - Distribution • A t curve is symmetric, bell shaped • Exact shape of distribution varies with sample size • When n nears 30, the value of t approaches the standard normal Z value • A particular distribution is identified by defining its degrees of freedom (df) • For a t distribution, df = (n -1)

Properties of t Curves • The total area under a t curve = 1 • A t curve extends indefinitely in both directions, approaching, but never touching the horizontal axis • A t-curve is symmetrical about 0 • As the degrees of freedom become larger, t curves look increasingly like the standard normal curve • We need to use a t-table and look for values of t, instead of Z to determine the confidence interval

Calculating various CIs • Sampling • SRS, systematic, or stratified • Parameters • Mean, total, or proportion • Six situations • Consider whether to use fpc • when n/N > 0.05 • Consider whether to use Z or t • when n < 30

If Random or Systematic Sample • Estimate of Population Mean • Best estimate is ? • Estimate of sampling error • Standard error of the mean (inc. fpc)

If Stratified Sample • Estimate of population mean • Still equal to sample mean but… • Std. Error of the mean (inc. fpc) Where m=number of strata; i= refers to a particular stratum

Minimum Sample Size • Before going out to the field, you want to know how big the sample ought to be for your research problem • Sample must be large enough to achieve precision and CI width that you desire • Formulas to determine the three basic population parameters with random sampling

Sample Size Selection - Mean • Your goal is to determine the minimum sample size • You want to situate the estimated population mean, in a specified CI E = amount of error you are willing to tolerate

Example 1 • We are looking at Neighborhood X • 3,500 households • Sample size = 25 households • Sample mean = 2.73 • Sample variance = 2.6 • CI = 90% • Find the mean number of people per household

Example 2 • Sample of 30 households • Sample standard deviation is 1.25 • What sample size is needed to estimate the mean number of persons per household in neighborhood X • and be 90% confident that your estimate will be within 0.3 persons of the true population mean?

Estimation in Sampling