STAT 111 Introductory Statistics

STAT 111 Introductory Statistics Lecture 9: Inference and Estimation June 2, 2004

Today’s Topics • Introduction to statistical inference • Point Estimation • Confidence Intervals

Introduction • The application of the methods of probability to the analysis and interpretation of empirical data is known as statistical inference. • More specifically, it is the process by which we generalize from a particular sample to the theoretical population from which the sample came.

Introduction • The precise form of the generalization can vary considerably from situation to situation. • Possible forms of statistical inference: • Single numerical estimate • Range of numerical estimates • Simple “yes” or “no”

Example • Suppose the chief programming executive at ABC is trying to decide which shows to cancel and which to renew. • Data might be the day-by-day logs of programs that are watched by a random sample of families. • Task: use sample information to estimate total number of viewers tuned to ABC programs.

Example • Suppose instead that a zoologist would like whether a particular species of vampire bat prefers blood at room temperature or at body temperature. • Equal numbers of similar bats into 2 cages; cage A has blood at room temperature, B at body temperature. • He finds bats in A consumed 3% more blood.

Estimation and Hypothesis Testing • The two previous examples highlight the two broad areas into which statistical inference is traditionally divided. • In the first example, inference is numerical. This is the area referred to as estimation. • In the second example, the inference is instead a “yes” or “no” decision between two conflicting theories. This is what we call hypothesis testing. • Both areas have wide applicability.

Parameter Estimation • In many situations, the family of probability models describing a phenomenon may be known (or at least assumed to be known), but the particular member of the family that best describes that phenomenon may be unknown. • Hence, estimating the unknown parameter or parameters of a presumed data model is usually one of the steps we have to take in an inference problem.

Parameter Estimation • Our usual goal then is to estimate the value of the (unknown) population parameter based on an appropriate statistic we observe from a random sample of that population. • Two types of estimation exist: • Point estimation – This is what we meant by a single numerical estimate. • Confidence interval – This is what we meant by a range of numerical estimates.

Point Estimation • A point estimator draws inference about a population by estimating the value of an unknown parameter using a single value or a point. parameter Population distribution Sampling distribution Point estimator

Point Estimation • A point estimate summarizes up the value of the population parameter using a single value. • Naturally, then, we have some properties for a point estimate that we would desire in order to feel comfortable using it. • What sort of properties should a good (point) estimator have?

Desirable Properties of Estimators • Certainly, it seems reasonable to ask as a first condition that the sampling distribution of our estimator be somehow “centered” with respect to the population parameter. • If this condition is not met, then our point estimator will tend to be consistently overestimating or underestimating the value of the parameter, something that we typically do not desire.

Desirable Properties of Estimators • This first condition is what we call unbiasedness. • In other words, on the average, a good estimator will be equal to the population parameter it is estimating. • Mathematically, if W is an estimator, and θ is the population parameter being estimated by W, then W is unbiased if

Desirable Properties of Estimators • A second property of a good estimator is precision. An estimator is said to be precise if its distribution’s dispersion is small. • The idea of precision leads to the concept of efficiency. • Suppose we have multiple unbiased estimators for the population parameter. Which one should we use? Are they all equivalent, or are some better than others?

Desirable Properties of Estimators • Formally, let W1 and W2 be two unbiased estimators for a population parameter θ with variances Var(W1) and Var(W2), respectively. • Then W1 is said to be more efficient than W2 if Var(W1) is less than Var(W2). • We define the relative efficiency of W1 with respect to W2 as the ratio Var(W2) / Var(W1). • Which is the more efficient estimator if this ratio is less than 1? Greater than 1?

Desirable Properties of Estimators • Unbiasedness and efficiency lead to the most basic characterizations of point estimates, but there are other properties of a statistic and its sampling distribution that merit examination. • The first concerns the limiting behavior of the statistic as the sample size n gets large. • In some cases, it is possible that the sampling distribution has some very desirable properties in the limit that it fails to possess for any finite n.

Desirable Properties of Estimators • Consistency is one such property of the sampling distribution that appears in the limit. • Roughly speaking, an estimator is consistent if, as n gets large, the probability that our statistic W lies arbitrarily close to the parameter being estimated becomes arbitrarily close to 1. • Two immediate implications of consistency: • W is asymptotically unbiased • Var(W) converges to 0

Desirable Properties of Estimators • The last property we might desire from an estimator is sufficiency. • If we draw a sample of size n from some population with a given distribution, we know that the sample space is all possible n-tuples. • An estimator W, then, has the effect of partitioning this sample space into a set of mutually exclusive subsets.

Desirable Properties of Estimators • As an example, suppose we draw two observations from a discrete distribution on the non-negative integers, and we define our statistic W as the mean of these two observations. • Then, W is observed to be 3 for any one of the following pairs of observations: (0,6), (1,5), (2,4), (3,3), (4,2), (5,1). And similarly, W will equal 2.5 if the outcome of our draws is (0,5), (1,4), (2,3), (3,2), (4,1), or (5,0).

Desirable Properties of Estimators • So, in this example, knowing the sample mean W of our outcome provides the same amount of information as the actual outcome itself does. • In other words, W is sufficient for the population parameter we are trying to estimate. • A statistic is sufficient if knowing its value gives us just as much information about the parameter of interest as knowing the actual sample itself does.

Example • Let X1, …, Xn be a simple random sample from a population with mean µ and variance σ2. • Suppose the sample size is larger than 1, and let m be an integer between 1 and n (i.e., 1 < m < n). • Consider these three estimators for µ:

Example • Which of these estimators is unbiased for µ? • What are the relative efficiencies of the three estimators (pairwise comparisons)? • Based on these results, which estimator is the most efficient? The least?

Interval Estimation • An interval estimator draws inference about a population by estimating the value of an unknown parameter using a interval parameter Population distribution Interval estimator Sampling distribution

Confidence Intervals • A confidence interval has the form estimate ± margin of error • The estimate is our guess for the value of the unknown population parameter. • The margin of error shows how accurate we believe our guess is, based on the variability of the estimate.

Example • The heights of American female students aged 18 to 24 are approximately normal with mean µand standard deviation 2.5. We repeatedly select 100 female students at random. The sample mean follows the normal distribution with mean µand standard deviation

Example • According to 68-95-99.7 rule, the probability is about 0.95 that will be within 0.5 inches(two standard deviations) of the population mean µ. • To say that lies within 0.5 inches of µ is the same as saying that µ lies within 0.5 inches of • So 95% of all samples we take will capture the true µ in the interval from to

Example • Suppose now we observe a sample with • Then, for the interval [63 – 0.5, 63 + 0.5] = [62.5, 63.5], we have two possibilities: • The interval between 62.5 and 63.6 contains the true µ. • Our SRS was one of the few samples for which is not within 0.5 inches of the true µ. Only 5% of all samples will give such inaccurate results.

Example • We say that we are 95% confident that the unknown mean height of American female students lies between 62.5 and 63.5. • This is shorthand for saying “we arrived at these numbers by a method that gives correct results 95% of the time.” • It is incorrect to say that there is probability 0.95 that the unknown mean height of American female students lies between 62.5 and 63.5

Confidence Intervals • Recall that the sampling distribution of the sample mean is, for large enough sample sizes, always at least approximately normal regardless of the actual probability distribution. • Suppose we choose an SRS of size n from a population with unknown mean µ and standard deviation σ. A level C confidence interval for µ is

Confidence Intervals • Here, z* is the value on the standard normal curve with area C between –z* and z*. • The confidence interval will be exact when the population distribution is normal, and thanks to the Central Limit Theorem, it will be approximately correct for large n in other cases.

Example • Assume that the helium porosity (in percentage) of coal samples taken from any particular seam is normally distributed with true standard deviation σ = 0.75 • Compute a 90% confidence interval for the true average porosity of a certain seam if the average porosity for 20 specimens from the seam was 4.85 • Compute a 95% confidence interval for the true average porosity of that same seam using the information above.

Confidence Intervals • Generally speaking, the margin of error is determined by the choice of C for the confidence interval. • High confidence and small margin of error are desirable. • High confidence – method almost always gives correct answers. • Small margin of error – parameter is pinned down quite precisely.

Confidence Intervals • Suppose you calculate a margin of error and decide that it is too large. • How to reduce it: • Use a lower level of confidence (smaller C) • Increase the sample size (larger n) • Reduce σ • In our last example, how would the 95% confidence interval change if our sample consisted of 200 specimens instead of 20?

Confidence Intervals • The confidence interval for a population mean will have a specified margin of error m when the sample size is • In surveys for determining proportions, this tends to explain why for a survey sample of about 1000 people gives a margin of error of approximately .03

Confidence Intervals • Remember: • Data must be an SRS from the population. • Formula is incorrect for complex sampling designs. • No correct method for inference from data haphazardly collect with unknown bias. • Outliers can have a large effect on the interval. • For small sample size and non-normal populations, the true confidence level is different from the value C. • Standard deviation σ must be known. • Margin of error covers only random sampling errors.

STAT 111 Introductory Statistics