150 likes | 215 Views
AP Statistics. Linear Regression Inference. Hypothesis Tests: Slopes. Given: Observed slope relating Education to Job Prestige = 2.47 Question: Can we generalize this to the population of all Americans?
E N D
AP Statistics Linear Regression Inference
Hypothesis Tests: Slopes • Given: Observed slope relating Education to Job Prestige = 2.47 • Question: Can we generalize this to the population of all Americans? • How likely is it that this observed slope was actually drawn from a population with slope = 0? • Solution: Conduct a hypothesis test • Notation: slope = b, population slope = b • H0: Population slope b = 0 • H1: Population slope b 0 (two-tailed test)
Review: Slope Hypothesis Tests • What information lets us to do a hypothesis test? • Answer: Estimates of a slope (b) have a sampling distribution, like any other statistic • It is the distribution of every value of the slope, based on all possible samples (of size N) • If certain assumptions are met, the sampling distribution approximates the t-distribution • Thus, we can assess the probability that a given value of b would be observed, if b = 0 • If probability is low – below alpha – we reject H0
If b=0, observed slopes should commonly fall near zero, too Sampling distribution of the slope b If observed slope falls very far from 0, it is improbable that b is really equal to zero. Thus, we can reject H0. 0 Review: Slope Hypothesis Tests • Visually: If the population slope (b) is zero, then the sampling distribution would center at zero • Since the sampling distribution is a probability distribution, we can identify the likely values of b if the population slope is zero
Bivariate Regression Assumptions • Assumptions for bivariate regression hypothesis tests: • 1. Random sample • Ideally N > 20 • But different rules of thumb exist. (10, 30, etc.) • 2. Variables are linearly related • i.e., the mean of Y increases linearly with X • Check scatter plot for general linear trend • Watch out for non-linear relationships (e.g., U-shaped)
Bivariate Regression Assumptions • 3. Y is normally distributed for every outcome of X in the population • “Conditional normality” • Ex: Years of Education = X, Job Prestige (Y) • Suppose we look only at a sub-sample: X = 12 years of education • Is a histogram of Job Prestige approximately normal? • What about for people with X = 4? X = 16 • If all are roughly normal, the assumption is met
Examine sub-samples at different values of X. Make histograms and check for normality. Good Not very good Bivariate Regression Assumptions • Normality:
Bivariate Regression Assumptions • 4. The variances of prediction errors are identical at different values of X • Recall: Error is the deviation from the regression line • Is dispersion of error consistent across values of X? • Definition: “homoskedasticity” = error dispersion is consistent across values of X • Opposite: “heteroskedasticity”, errors vary with X • Test: Compare errors for X=12 years of education with errors for X=2, X=8, etc. • Are the errors around line similar? Or different?
Examine error at different values of X. Is it roughly equal? Bivariate Regression Assumptions • Homoskedasticity: Equal Error Variance Here, things look pretty good.
At higher values of X, error variance increases a lot. Bivariate Regression Assumptions • Heteroskedasticity: Unequal Error Variance This looks pretty bad.
Bivariate Regression Assumptions • Notes/Comments: • 1. Overall, regression is robust to violations of assumptions • It often gives fairly reasonable results, even when assumptions aren’t perfectly met • 2. Variations of regression can handle situations where assumptions aren’t met • 3. But, there are also further diagnostics to help ensure that results are meaningful…
Regression Hypothesis Tests • If assumptions are met, the sampling distribution of the slope (b) approximates a T-distribution • Standard deviation of the sampling distribution is called the standard error of the slope (sb) • Population formula of standard error: • Where se2 is the variance of the regression error
Regression Hypothesis Tests • Estimating se2 lets us estimate the standard error: • Now we can estimate the S.E. of the slope:
Regression Hypothesis Tests • Finally: A t-value can be calculated: • It is the slope divided by the standard error • Where sb is the sample point estimate of the standard error • The t-value is based on N-2 degrees of freedom
Regression Confidence Intervals • You can also use the standard error of the slope to estimate confidence intervals: • Where tN-2 is the t-value for a two-tailed test given a desired a-level • Example: Observed slope = 2.5, S.E. = .10 • 95% t-value for 102 d.f. is approximately 2 • 95% C.I. = 2.5 +/- 2(.10) • Confidence Interval: 2.3 to 2.7