1 / 40

Introduction to Statistics: Political Science (Class 9)

Introduction to Statistics: Political Science (Class 9). Review. Probability of having cardiovascular disease. Purpose of statistics: Inferences about populations using samples We draw a random sample of 1,000 adults and 405 have some form of CVD

ind
Download Presentation

Introduction to Statistics: Political Science (Class 9)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Statistics: Political Science (Class 9) Review

  2. Probability of having cardiovascular disease • Purpose of statistics: • Inferences about populations using samples • We draw a random sample of 1,000 adults and 405 have some form of CVD • Based on our sample, if we randomly select one adult from the population: what is the probability that they have cardiovascular disease?

  3. Probability of exercising <3 days/week? Probability of CVD among those who exercise <3 days/week? Probability of CVD among those who exercise 3 or more days/week? Conditional Probability

  4. Association between exercise and CVD? p1 = 28.9/(30.3+28.9) = 0.488 p2 = 10.6/(30.2+10.6) = 0.260 Difference = 0.488 - 0.260 = .228 Those who exercise less than 3 days/week .228 (22.8%) more likely to have CVD

  5. Specifying and testing hypotheses • Difference of proportions = .228 • What’s our null hypothesis? • Why a “null hypothesis”? Why not test whether the difference is .228? • Central limit theorem • In repeated sampling, the distribution of our estimates of the mean (or difference of means or slope) will be normally distributed and centered over the true population value

  6. Central limit theorem 0 1 standard error Proposed true value

  7. Comparing proportions • Difference of proportions = .228 p1 = 28.9/(30.3+28.9) = 0.488 (N=602) p2 = 10.6/(30.2+10.6) = 0.260 (N=398) • Standard error of this difference:

  8. Comparing proportions • So, standard error of difference is the square root of: (.488*(1-.488)/602)+(.260*(1-.260)/398) • Which is .0299 • Difference of proportions = .237

  9. Hypotheses • Null hypothesis: • There is no difference in the rate of CVD between those who exercise less than 3 days/week and those who do • Alternate hypothesis: • There is a difference in the rate of CVD between those who exercise less than 3 days/week and those who do • (i.e., the difference is not 0)

  10. If 0 is was the true difference, it would be very unlikely that we would find a difference 7.93 (.237/.0299) standard errors from that value by chance 0 1 standard error Proposed true value

  11. Does exercise cause lower CVD? • Reverse causation? Might CVD cause exercise? • Failure to account for confounds • Typically leads to over-estimating the strength of a relationship (not always… but usually)

  12. Specification and Interpretation Multivariate Regression

  13. Does exercise make CDV less likely? • Regression (predict CDV) • Estimated likelihood of CDV if exercise 4 days/week? • What might confound our estimate of the relationship between exercise and CVD? Coef. SE T P-value Days Exercise (0-7) -0.06 .001 ? 0.000 Constant 0.56 .002 ? 0.000

  14. Controlling for confounds Coef. SE T P-value Days Exercise (0-7) -0.03 .001 -3.0 0.002 Days Fast Food (0-7) 0.04 .002 2.0 0.048 Constant 0.42 .002 21.0 0.000

  15. High Fast Food Low Fast Food % Chance CVD Days per Week Exercise

  16. Controlling for dichotomous confounds Coef. SE T P-value Days Exercise (0-7) -0.03 .001 -3.0 0.002 Days Fast Food (0-7) 0.04 .002 2.0 0.048 Smoker (1=yes) 0.11 .001 11.0 0.000 Constant 0.38 .002 19.0 0.000 • Predicted probability of CVD for • 2 days exercise, 2 days Fast food, smoker

  17. Nominal Variables • Variable that does not have an “order” to it • Nothing is “higher” or “lower” • Create set of dichotomous variables • Always interpret coefficients with respect to the reference category

  18. Controlling for nominal confounds Coef. SE T P-value Days Exercise (0-7) -0.03 .001 -3.0 0.002 Days Fast Food (0-7) 0.03 .002 1.5 0.135 Smoker (1=yes) 0.09 .001 9.0 0.000 South (1=yes) 0.03 .002 1.5 0.137 West (1=yes) -0.01 .002 -0.5 0.642 Northeast (1=yes) 0.02 .002 1.0 0.410 Constant 0.34 .002 17.0 0.000 (Midwest is excluded category) What if we wanted to test whether including region indicators improves fit of the model?

  19. Non-linear relationships

  20. Logarithms Why use a logarithmic transformation? You think the relationship looks like this…

  21. Logarithms

  22. Squared term – U(or ∩)-shaped relationship Age and political ideology (-2=very conservative, 2=very liberal)

  23. Age and Political Ideology

  24. Create indicators from an ordered variable • Party Identification (-3 to 3) • Seven Variables: • Strong Republican (1=yes) • Weak Republican (1=yes) • Lean Republican (1=yes) • Pure Independent (1=yes) • Lean Democrat (1=yes) • Weak Democrat (1=yes) • Strong Democrat (1=yes)

  25. Predict Obama Favorability (1-4) Excluded category: Pure Independents

  26. Obama Favorability

  27. Predict Obama Favorability (1-4) New excluded category: Leaning Republicans

  28. Interactions • One variable moderates the effect of another – i.e., the relationship between one variable and an outcome depends on the value of another variable

  29. 61.100 + Party(1.286 + Voted*3.575) – 1.138*Voted+ u • 61.100 + Party*1.286 + Voted(Party*3.575 –1.138)+ u • Regression estimates an equation… • 61.100 + 1.286*Party – 1.138*Voted + 3.575*Party*Voted + u • 61.100 + Party*1.286 + Party*Voted*3.575 – 1.138*Voted+ u • OR • 61.100 + Party*1.286 + Voted*Party*3.575 – Voted*1.138+ u

  30. Establishing causality

  31. Dealing with confounds • Theory + multivariate regression • Experiments

  32. Dealing with reverse causation • Theory • Experiments

  33. Experiments • What is the key characteristic of an experiment? • How does this address reverse causality? • How does it address confounds? • Weaknesses/limitations of experiments?

  34. Exam Expectations • Describe probabilities / conditional probabilities • Write hypotheses • Demonstrate understanding of how null hypotheses relate to the central limit theorem • Test difference of proportions (formula for SE will be provided) • Interpreting multivariate regression • Relationships (slopes) • Predicted values • Sketch graphs of relationships • Discuss strengths and limitations of analyses • Why an estimated slope might be biased • Benefits and limitations of experiments

  35. Notes • Homework 3 graded • Homework 4 due Thursday 12/9 • Office hours next week – email to come • Exam December 14 at 2pm

More Related