1 / 21

Sociology 601 Class 19: November 3, 2008

Sociology 601 Class 19: November 3, 2008. Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model Assumptions, and their effects (9.6). 9.5 Inference for a slope.

gshipp
Download Presentation

Sociology 601 Class 19: November 3, 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sociology 601 Class 19: November 3, 2008 • Review of correlation and standardized coefficients • Statistical inference for the slope (9.5) • Violations of Model Assumptions, and their effects (9.6)

  2. 9.5 Inference for a slope. • Problem: we have measures for the strength of association between two linear variables, but no measures for the statistical significance of that association. • We know the slope & intercept for our sample; what can we say about the slope & intercept for the population? • Solution: hypothesis tests for a slope and confidence intervals for a slope. • Need a standard error for the coefficients • Difficulties: additional assumptions, complications with estimating a standard error for a slope.

  3. Assumptions Needed to make Population Inferences for slopes. • The sample is selected randomly. • X and Y are interval scale variables. • The mean of Y is related to X by the linear equation E{Y} =  + X. • The conditional standard deviation of Y is identical at each X value. (no heteroscedasticity) • The conditional distribution of Y at each value of X is normal. • There is no error in the measurement of X.

  4. Common Ways to Violate These Assumptions • The sample is selected randomly. • Cluster sampling (e.g., census tracts / neighborhoods) causes observations in any cluster to be more similar than to observations outside the cluster. • Two or more siblings in the same family. • Sample = populations (e.g., states in the U.S.) • X and Y are interval scale variables. • Ordinal scale attitude measures • Nominal scale categories (e.g., race/ethnicity, religion)

  5. Common Ways to Violate These Assumptions (2) • The mean of Y is related to X by the linear equation E{Y} =  + X. • U-shape: e.g., Kuznets inverted-U curve (inequality <- GDP/capita) • Thresholds: • Logarithmic (e.g., earnings <- education) • The conditional standard deviation of Y is identical at each X value. (no heteroscedasticity) • earnings <- education • hours worked <- time • adult child occupational status <- parental occupational status

  6. Common Ways to Violate These Assumptions (3) • The conditional distribution of Y at each value of X is normal. • earnings (skewed) <- education • Y is binary, or a % • There is no error in the measurement of X. • almost everything • what is the effect of measurement error in x on b?

  7. The Null hypothesis for slopes • Null hypothesis: the variables are statistically independent. • Ho:  = 0. The null hypothesis is that there is no linear relationship between X and Y. • Implication for : E{Y} =  + 0*X =  ; •  = . • (Draw figure of distribution of Y, X when Hois true)

  8. Test Statistic for slopes • What is the range of b’s we would get if we take repeated samples from a population and calculate b for each of those samples? • That is, what is the standard error of the sample slope b’s? • Test statistic: t = b /hat b • where hat bis the standard error of the sample • slope b. • df for the t statistic (with one x – variable) is n-2 • when n is large, the t statistic is asymptotically equivalent to a z-statistic • What would make hat b smaller?

  9. Calculating the s.e. of b • hat b = hat / (sX*sqrt(n-1)) • where hat = sqrt(SSE/n-2) (= root MSE) • the standard error of b is smaller when… • the sample size is large • the standard deviation of X is large (there is a wide range of X values) • the conditional standard deviation of Y is small.

  10. Conclusions about Population • P-value: • calculated as in any t-test, but remember df = n-2 • a z-test is appropriate when n > 30 or so • Conclusions: • evaluate p-value based o n a previously selected alpha level • Rule of thumb: b should be at least 2x standard error.

  11. Example of Inference about a Slope • In an analysis of poverty and crime in the 50 states plus DC, a computer output provides the following: • E{Murder rate} = -10.14 + 1.322*{Poverty rate} • (Poverty rate in %, murder rate per 100,000) • SSE = 3904.3 SST = 5743.3 • N = 51 Sx = 4.584 • Do a hypothesis test to determine whether there is a linear relationship between crime rates and poverty rates.

  12. Stata Example of Inference about a Slope • In an analysis of poverty and crime in the 50 states plus DC, stata computer output provides the following: • regress murder poverty • Source | SS df MS Number of obs = 51 • -------------+------------------------------ F( 1, 49) = 23.08 • Model | 1839.06931 1 1839.06931 Prob > F = 0.0000 • Residual | 3904.25223 49 79.6786169 R-squared = 0.3202 • -------------+------------------------------ Adj R-squared = 0.3063 • Total | 5743.32154 50 114.866431 Root MSE = 8.9263 • ------------------------------------------------------------------------------ • murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • poverty | 1.32296 .2753711 4.80 0.000 .7695805 1.876339 • _cons | -10.1364 4.120616 -2.46 0.017 -18.41708 -1.855707 • ----------------------------------------------------------------------------- • Interpret whether there is a linear relationship between crime rates and poverty rates.

  13. Example of Inference about a Slope • SSE = 3904.3 SST = 5743.3 • N = 51 Sx = 4.58 • b= 1.323 • b= 1.323

  14. Example of Inference about a Slope • SSE = 3904.3 SST = 5743.3 • N = 51 Sx = 4.58 • b= 1.323 • seb= sqrt (SSE / (n-2) ) / (sx * sqrt(n-1)) • = sqrt (3904.3/49) / ( 4.585*sqrt(50) ) • = sqrt (79.68) / (4.585 * 7.071) • = 8.926 / 32.421 • = 0.275 • t = b / seb = 1.323 / 0.275 = 4.81 • p < .001 • 95% confidence interval for b = 0.783 to 1.861

  15. Confidence interval for a slope. • Confidence interval for a slope: • c.i. = b ± t*hat b • the standard t-score for a 95% confidence interval is • t.025, with df = n-2 • An alternative to a confidence interval is to report both b and hat b .

  16. Example of Confidence Interval of a Slope • SSE = 3904.3 SST = 5743.3 • N = 51 Sx = 4.58 • b = 1.323 • seb = 0.275 • 95% confidence interval for • b = 1.322 +- 2.009*0.275 • = 1.322 +- 0.552 • = 0.783 to 1.861

  17. Inference for a slope using STATA • . regress attend regul • Source | SS df MS Number of obs = 18 • -------------+------------------------------ F( 1, 16) = 9.65 • Model | 2240.05128 1 2240.05128 Prob > F = 0.0068 • Residual | 3715.94872 16 232.246795 R-squared = 0.3761 • -------------+------------------------------ Adj R-squared = 0.3371 • Total | 5956 17 350.352941 Root MSE = 15.24 • ------------------------------------------------------------------------------ • attend | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • regul | -5.358974 1.72555 -3.11 0.007 -9.016977 -1.700972 • _cons | 36.83761 5.395698 6.83 0.000 25.39924 48.27598 • ------------------------------------------------------------------------------ • The significance test and confidence interval for b appear on the line with the name of the x-variable. • Can you find SSE and SST? df for the model? r?

  18. Things to watch out for: extrapolation. • Extrapolation beyond observed values of X is dangerous. • The pattern may be nonlinear. • Even if the pattern is linear, the standard errors become increasingly wide. • Be especially careful interpreting the Y-intercept: it may lie outside the observed data. • e.g., year zero • e.g., zero education in the U.S. • e.g., zero parity

  19. Things to watch out for: outliers • Influential observations and outliers may unduly influence the fit of the model. • The slope and standard error of the slope may be affected by influential observations. • This is an inherent weakness of least squares regression. • You may wish to evaluate two models; one with and one without the influential observations.

  20. Things to watch out for: truncated samples • Truncated samples cause the opposite problems of influential observations and outliers. • Truncation on the X axis reduces the correlation coefficient for the remaining data. • Truncation on the Y axis is a worse problem, because it violates the assumption of normally distributed errors. • Examples: Topcoded income data, health as measured by number of days spent in a hospital in a year.

  21. Things to watch out for: measurement error • Error in measurement of the X variable creates a bias that makes the correlation appear weaker. • This problem can be a measurement issue or an interpretation issue.

More Related