190 likes | 398 Views
Introduction to Estimation. Notes by Prof. Ive Barreiros. Chapters of Statistical Inference. Estimation Point Estimators Interval Estimators Hypothesis testing. Important terms.
E N D
Introduction to Estimation Notes by Prof. Ive Barreiros
Chapters of Statistical Inference • Estimation Point Estimators Interval Estimators • Hypothesis testing
Important terms • A parameter is a characteristic measure of the population. Example: population mean, population standard deviation, population range, population median. • A statistic is a measure of a sample, computed using measurements from that sample of the population. Example: sample mean, sample standard deviation, sample range, sample median. Notice that a statistic depends on the sample used to compute it. Different samples may lead to different values of the statistic.
Example • Population: 2,5,8,5,4,6 • Population mean (parameter): 5 • A possible sample of size three: 2,5,5 • Sample mean (statistic) : 4 • Another possible sample of size three: 2,8,6 • Sample mean (statistic) : 5.33
Point Estimator • Formula that, applied to data, provides an estimate of a parameter. • Very often the formulas coincide with the measure that is been estimated. Example: the sample mean is a point estimator of the population mean. In the previous example, the first sample estimated the population mean as 4
Interval estimate • The point estimate is accompanied by a bound of the error Example: According to the current survey the unemployment rate in Unhappy Ville last month was: 7.3% 0.2% It means that the researcher is highly confidence That the population unemployment rate is somewhere between 7.1 and 7.5
Confidence • The interval in the previous example should be accompanied by a specified “confidence” level. For example 95%. • In means that, 95% of possible samples would have been able to provide intervals containing the population (true) unemployment rate. • 5% is the risk that users are taking by using the researcher’s estimate.
Purpose of Estimation • To provide an estimate of a population parameter. In theory, the parameter value could be computed if a measurement could be obtained from every subject in the population. In practice, that is too expensive, and sometimes operationally impossible • To give an idea of the precision of the estimate and reliability of the estimation procedure This is possible only is the estimate is obtained from a “probabilistic sample”. In probabilistic sampling every subject in the population has a known probability of being selected .
Estimating the population mean The sample mean is the natural estimator of the population mean. We will use the mean of a sample (statistic) as an estimate of the mean of the population (parameter). • The difference between the sample mean and the population mean is called “sampling error” of the estimation.
THE CENTRAL LIMIT THEOREM It states that, for large samples, whose size is small with respect to the population, the following holds: a.The sample averages are approximately normally distributed. b.The mean of this distribution equals the mean of the population from which the sample has been drawn. c.The standard deviation of the sample means equals the standard deviation of the population divided by the square root of the sample size. This last amount is called the standard error of the sample mean when the sample mean is used as an estimator of the population mean
THE CENTRAL LIMIT THEOREM • The previous statements are independent of the distribution of the population from which the sample have been selected. • If the population is normal, then in a) “approximately” can be removed.
EXAMPLE If we select a sample of size 100 from a large population whose mean is 80 and whose standard deviation is 15, then • the sample means will be approximately normally distributed • around a mean of 80 • with a standard deviation + 15/10 = 1.5 • Notice that 1.5 is the standard error of the sample mean. It is a representation of the sampling errors for all possible sample means.
That means that: • 95% of the samples will produce sample means between 77.and 83, or • 95% of the samples will produce sample means that will differ from the population mean by less than 3 If the population mean (80 in this case) were unknown and the researcher needed to estimate by sampling, he would have 95% chance of selecting one of those sample, in which case his error of estimation will be at most 3.
Estimating with Confidence That chance (95%) is what gives the researcher “confidence” in the method. He will present his sample mean as an estimate of the population mean in the form Sample mean ± a bound of the error Sample mean ± 2 (standard error) Sample mean ± 2 (population standard deviation) square root of the sample size Note: We learned in class that the value 2 used above for a confidence of 95% is an approximation. The value from the normal table is 1.96
Important Note This procedure is valid when the sample is large (n≥ 30) and it is small with respect to the population (no more than 5% of it) Intervals for large samples with different confidence level
Presentation of the Estimate Estimation by interval can be reported as • Sample mean ± bound of the error Example 78 ± 3 • Sample mean - bound of the error < μ < Sample mean +- bound of the error Example: 75 < μ < 81 • (Sample mean - bound of the error, Sample mean +- bound of the error) Example: (75, 81)