260 likes | 638 Views
Estimation in Sampling!?. Chapter 7 – Statistical Problem Solving in Geography. Goals. Basis Concepts in Estimation Point Estimation and Interval Estimation Sampling Distribution of a Statistic Central Limit Theorem Confidence Intervals and Estimation Standard Normal and Z-Scores
E N D
Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography
Goals • Basis Concepts in Estimation • Point Estimation and Interval Estimation • Sampling Distribution of a Statistic • Central Limit Theorem • Confidence Intervals and Estimation • Standard Normal and Z-Scores • General Procedure for Constructing a Confidence Interval • Geographic Examples of Confidence Intervals • Sample Size Selection • Mean, Total and Proportion in Sample Size Selection
Points in Estimation • Estimation:Goal of sampling is estimation and inferences of population characteristics. • Point Estimation • A statistic is calculated from sample to estimate a corresponding population parameter. • In probability sampling the “best” point estimate for a population is the corresponding sample statistic. • For , for (Sample’s standard deviation) • Calculating Point Estimates: • See table 7.1, Page 98 – McGrew and Monroe
Intervals in Estimation • Interval Estimation • Due to the nature of uncertainty it is unlikely that a sample statistic will equal a population parameter. • Used to determine the distance that a sample statistic is from a population parameter. • Interval estimation uses a confidence intervalto establish the likelihood that a sample statistic is within an interval or range from the population parameter. • Confidence Interval: Represents level of precision associated with the population estimate. Width is determined by 1) sample size; 2) amount of variability in the population’; and 3) the probability level or level of confidence selected for the problem.
Sampling Distribution of a Statistic • A single sample of size n will lead to a distribution curve which could be any of the curves that we have discussed. • Examples are Poisson, Uniform, Normal, etc. • This single sample will produce a sample mean and standard deviation. • Sampling Distribution of Sample Means: If you take multiple, similar-sized independent samples from a population the set of sample means can be graphed. • The red curve is the Sampling Distribution of Sample Means • The black curve represents the frequency distribution of values within the population
Central Limit Theorem • Given the effect of randomness in drawing samples, some sample means will fall above the population mean and some below. • Provided they are independent samples the mean of the all of the sample means will be the population mean. • The distribution of sample means will also be normal and centered on the population mean regardless of the distribution of the population provided that the sample is larger than 30. • When the sample size (n) is large, the sample mean(s) will be closer to the population mean.
Central Limit Theorem • One final component of the Central Limit Theorem is : • Standard Error of the Mean: According to this theorem the standard deviation of the sampling distribution can be determined by thus standard error is a basic measure of sampling error. • http://www.youtube.com/watch?v=BvB1QqwurK0 • Sampling Error: The larger the sample size, the smaller the amount of sampling error. Thus, the larger the sample the closer the sample mean is to the population mean. In addition, the larger the standard deviation of the population, the larger the amount of sampling error due to the larger variability in the population.
Central Limit Theorem • The Central Limit Theorem is completely true only for infinitely large populations. • Within a finite population a correction process may be incorporated. • Finite Population Correction: Applied to the estimation process when the sampling fraction is large. Include the fpc in the population estimate equations only when the ratio of sample size to population exceeds 5% (> .05). • If it is determined that you should include the fpc then the equation for finding fpc should be included in the standard error equation as: (fpc) =
Confidence Intervals and Estimation • A confidence interval is placed to demonstrate the likelihood that a sample mean is within an interval range of the population mean. • A confidence interval is determined: • Z = z-score from the standard normal table • = sample mean • = standard error of mean • A 90% confidence interval thus gives 90% certainty that a population mean lies within the confidence interval defined. • The shaded area in the figure represents the 90% confidence interval. Notice that there is a .05 area in the upper limit and lower limit where the true mean could fall.
Z-Scores and Confidence Intervals • In order to establish a confidence interval we must determine a z-score. • This can be done by looking at a table to see z-scores of common confidence intervals! • More information on z-scores can be found at this website.
Using Interval Estimates • Confidence Level:Probability that the interval surrounding a sample mean encompasses the true population mean. Defined as 1 - . • Significance Level: Probability that the interval surrounding a sample mean fails to encompass the true population mean. The significance level is denoted by equals the total sampling error. Since error goes in both directions the probability of it falling into either tail is
Constructing a Confidence Interval • Establish sample mean, population standard deviation, sample size and the z-value for the desired confidence level. • Plug the numbers into the confidence level equation • This will allow you to calculate the sample mean ± the interval as a z-score. • Ensure that finite population correction (slide 8) is not needed.
What Level of Confidence? • .99, .95 and .90 are the most commonly used confidence intervals to establish the mean. • Higher confidence results in wider intervals and thus less precise estimates but lower sampling error . • Lower confidence results in smaller intervals but higher sampling error . • Balance acceptable level of error with needed level of precision.
The Real World: Unknown Population Standard Deviation • Rarely do we know the parameters of a population hence our attempts to estimate them! • is generally unknown, so how do we estimate standard error? • Using the sample variance which is the standard deviation squared is an acceptable approach. • Standard Error Revisited: The standard deviation of the mean group of samples. (fpc) = So, we put in the sample variance and take the root of the variance to get the standard error =
What if Sample Size is Small? • Z is valid only if the sample size is greater than 30 so our confidence interval equation must be altered if we have a smaller sample. • Instead we use a t-distribution which approaches the standard normal value as the sample size approaches 30. • In this instance the confidence interval formula is • We can use the t-table to determine the value of t
The T-Table • The t-table is dependent on two values: • The Significance Level ( of which the common levels are .10, .05, .01 as determined earlier by the common confidence levels. • Degrees of freedom which is determined by taking the sample size and subtracting one: • df = n-1 T-Table (click to view full table)
But! To Calculate a Confidence Interval…. • Equation used depends on two factors • The equation used for a confidence interval depends on the sample type (random, systematic, stratified, etc.) • Different population parameters require different confidence interval equations.
Random or Systematic Samples • Random or Systematic Sample – Estimating Population Mean • Use the t equation for samples less than 30 and the z for those greater. • Use sample variance as it is rare that we know the population • Random or Systematic Sample – Population Total • Best estimate of population total () is the sample total (T) which is T = N • Once we know T (which is not a t-score) we plug it into the equation. • T
Random or Systematic Samples Continue…. • Random or Systematic Sample – Estimate of Population Proportion • The best estimate of the population proportion () is the sample proportion (P) • The sample proportion is the number of individuals in the sample having the specified characteristic (x) divided by the total sample size (n) which is: • The confidence interval around this population estimate of the proportion is: • P
Stratified Samples • A stratified sample is a little more complicated… • Stratified Sample – Estimate of Population Mean • You will be using different groups called stratum. • These will be denoted by 1,2,3, etc. • Thus you will have and , etc. for the parameters. • The best estimate of the population mean is the stratified sample mean. • M in this equation represents the number of strata • Subscript i is the number of each variable in the strata. • is the population of the strata • The confidence interval around the mean is • Note the finite population correction which may or many not be needed.
Stratified Samples • Stratified Sample – Estimate of Population Total • Best estimate of population total () is the sample total (T) which in stratified samples • T = • We sum the strata • Once we know T (which, again is not to be confused with a t-score) we plug it into the equation to obtain the confidence interval • T • Note that the equation has the finite population correction which may not be needed.
Stratified Samples • Stratified Sample – Estimate of Population Proportion • The best estimate of the population proportion () is the sample proportion (P) which in stratified samples is: • P = • Once again summing the strata • Then the confidence interval can be obtained • P • Note that the equation has the finite population correction which may not be needed.
Sample Size Selection • Sample Size Selection – Using the Mean • For practicality sometimes we would prefer to predetermine our confidence interval and then calculate the sample size needed. • Recall that the confidence interval of the mean is • Let us designate E as the Error that we are willing to tolerate. • E = = • We then decide what error we can have around the population mean. • For example .10, .05, .01, etc. • Algebraically we can then obtain • n = • Since in most instances we will not know we substitute with sample sigma. But how do we find this!? • Sample sigma can be found by taking a preliminary sample greater than 30, then calculated, and then we can continue the random sample for the result of n. • When the pre-sample and then continued sample occurs it is called two-stage sampling design.
Sample Size Selection • Sample Size Selection – Total • The minimum sample needed to make an interval estimate of a population total within a tolerance level E can also be determined. • E = = • We can then isolate n through algebra • or • Recall that s is used when we do not know population • It is best to run a pretest or small sample to obtain
Sample Size Selection • Sample Size Selection – Proportion • To estimate a population proportion within a certain allowable level of Error (E), the minimum sample size can also be calculated in advance of full sampling. • E = = • n is isolated algebraically so that: • The population proportion (p) or sample proportion if it is unknown is used. These symbols look very similar. • The population proportion allows for us to estimate without first taking a pretest or preliminary sample. • This is related to the p(1-p) and the range of values that it can take. • The largest value of p(1 – p) is .25 as the values peak at p = .5 • Thus, we can use the value of p(1 – p) = .25 as a worst case scenario and use it in any data. • We are however, still able to do a pretest if needed and can obtain a smaller p value.