Introduction to Inference: Confidence Intervals and Hypothesis Testing

Presentation 4 First Part Introduction to Inference:Confidence Intervalsand Hypothesis Testing

What is inference? Inference is when we use a sample to make conclusions about a population. 2. Describe the SAMPLE 1. Draw a Representative SAMPLE from the POPULATION 3. Use Rules of Probability and Statistics to make Conclusions about the POPULATION from the SAMPLE.

Population Parameters • p = population proportion • µ = population mean • σ = population standard deviation • β1= population slope (we will see later) Sample Statistics • = sample proportion • = sample mean • s = sample standard deviation • b1 = sample slope (we will see later)

Two Types of Inference • Confidence Intervals: • Confidence Intervals give us a range in which the population parameter is likely to fall. • We use confidence intervals whenever the research question calls for an estimation of a population parameter. Example: Estimate the proportion of US adult women who would vote for Hillary Clinton as president. Example: What is the mean age of trees in the forest?

Two Types of Inference, Cont 2. Hypothesis Testing: • Hypothesis tests are tests of population parameters. Example: Is the proportion of US adult women who would vote for Hillary Clinton greater than 50%? • We can only prove that a population parameter is ‘different’ than our null value. We cannot prove that a population parameter is equal to some value. Example: Valid Hypothesis: Is the mean age of trees in the forest greater than 50 years? Invalid Hypothesis: Is the mean age of trees in the forest equal to 50 years?

Types of CI’s and Hypothesis Tests For Hypothesis Tests and C.I.’s: • 1-proportion (1-categorical variable) • 1-mean (1-quantitative variable) • Difference in 2 proportions (2-categorical variables, both with 2 possible outcomes) • Difference in 2 means (1-quantitative and 1-categorical variable, or 2-quantitative variables, independent samples) • Regression, Slope (2-quantitative variables) For Hypothesis Tests only: • Chi-Square Test (2-categorical variables, at least one with 3 or more levels!)

Some Examples • Polina wants to estimate the mean high-school GPA of incoming freshman at FIT. Solution- CI for one population mean. • Pampos wants to know if the proportion of PSU students who engage in under age drinking is greater than 25%. Solution- Hypothesis test of one proportion Null Hypothesis: H0: p ≤ .25 Alternative Hypothesis: Ha: p > .25 • Isaac wants to estimate the difference in the proportion of men and women who smoke. Solution- CI for difference in 2-proportions.

Interpreting Confidence Intervals • Given the confidence level, 90%, 95%, 99%, etc. conclude the following (let L= confidence level): “With L% confidence the population parameter is within the confidence interval.” Example: Suppose the 90% CI for age of trees in the forest is (32,45) years. We are 90% confident that the true mean age of trees in the forest is between 32 and 45 years.

Interpreting Hypothesis Tests • There are two hypotheses, the null and the alternative. The research aim is to to prove the alternative hypothesis significant. • Use the p-value to determine whether we can reject the null hypothesis (H0). • At this point we don’t need to know the exact definition, or how to calculate the p-value. But generally, the p-value is a measure of how consistent the data is with the null hypothesis. A small p-value (<.05) indicates the data we obtained was UNLIKELY under the null hypothesis. Decision Rule: If the p-value is <.05 we REJECT the null hypothesis, and accept the alternative. We have a statistically significant result! If the p-value is >.05 then we say that we do NOT have enough evidence in the data to reject the null hypothesis.

Second Part Confidence Intervalsfor 1-Proportion

Sample Proportion • Mean for = E( ) = p StdDev for = s.d.( ) = Standard Error of = s.e.( ) = • If np and n(1-p) are greater than or equal to 10, the sampling distribution of is approximately normal with mean p and standard deviationi.e.

From Sampling Distributions to Confidence Intervals… • The sample proportion will fall close to the true (unknown) proportion. • Thus, the true proportion is likely to be close to the observed sample proportion. How close? • 95% of the would be expected to fall within ± 2 standard deviations of the true proportion p. • SO if we were to construct intervals around the sample proportion with a width of ± 2 standard deviations these intervals would contain the TRUE population proportion 95% of the time!

Margin of Error & C.I. • is an estimator of p but it is not exactly equal to p. • But how far is from p? Or, how far is p from ? • Margin of Error is a measure of accuracy providing a likely upper limit for the difference between and p. • In other words, this difference is almost always less than the Margin of Error, i.e. • The almost always is translated “with large probability”. • Usually we are talking about 90%, 95% or 99% probability.

Margin of Error & C.I., Cont • This probability is the confidence level. • For example, if the confidence level is 95%, it means that 95% of the times the difference between and p is less than the Margin of Error. (e.g. we expect 38 out of 40 samples to give a such that its difference with p is less than the Margin of Error.) • Example: Based on a sample of 1000 voters, the proportion of voters who favor candidate A are 34% with a 3% Margin of Error based on a 95% confidence level. What does this tell us?

Confidence Interval for 1-proprtion • Conditions: We need to have • Note that we are using instead of p here! • CI for p: • M = multiplier, depends on the level of confidence desired. For a 95% CI the multiplier is ~ 2. • SE( ) is the standard error of the sample proportion. • Margin of Error = the multiplier times the SE • Interpretation: If M=2, we are 95% confident that the true population proportion is contained within the confidence interval. Margin of Error

Example 1: A sample of 1200 people is polled to determine the percentage that are in favor of candidate A. Suppose 580 say they are in favor. Construct a 95% CI for the true population proportion. Conclusion: We are 95% confident that the true population proportion of those who support candidate A is between 45.5% and 51.2%.

300 high-risk patients received an experimental AIDS vaccine. The patients were followed for a period of 5 years and ultimately 53 came down with the virus. Assuming all patients were exposed to the virus construct a 99% CI for the proportion of individuals protected. 99% CI = ± M*SE( ) = 247/300 = .823 SE( ) = = sqrt(.823*(1-.823)/300) = .0220 M = 2.58 Can you see why M=2.58 using the Normal table? So 99% CI = .823 +/- 2.58*.0220 = (.767,.880) We are 99% confident that the true proportion of those protected by the vaccine is between 76.7% and 88.0%. Example 2:

Width of a Confidence Interval is affected by: n as the sample size increases the standard error of decreases and the confidence interval gets smaller. So a larger sample size gives us a more precise estimate of p. M as the confidence level increases, M the multiplier increases leading to a wider confidence interval. So, if we want to control the length of the C.I. we can adjust the confidence level or the sample size...

Introduction to Inference: Confidence Intervals and Hypothesis Testing