1 / 27

PSC 5940: Regression Review and Questions about “Causality”

PSC 5940: Regression Review and Questions about “Causality”. Session 2 Fall, 2009. Data Discussion. EE09 & NS09 Data: research ideas? Fixing data in Excel: EE09 NA replacement Text to numeric (e28_gcc) Getting rid of extraneous characters $ in “random_p” EE and partisanship

lars-byrd
Download Presentation

PSC 5940: Regression Review and Questions about “Causality”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PSC 5940: Regression Review and Questions about “Causality” Session 2 Fall, 2009

  2. Data Discussion • EE09 & NS09 Data: research ideas? • Fixing data in Excel: EE09 • NA replacement • Text to numeric (e28_gcc) • Getting rid of extraneous characters • $ in “random_p” • EE and partisanship • Loading and attaching the data • Examining party identification (“e216_par”) • Examining gender (“e3_gender”) • Dealing with awkward names and NA values

  3. a b Deterministic Linear Models • Theoretical Model: • b0andb1are constant terms • b0 is the intercept • b1 is the slope • Xi is a predictor of Yi Yi b0 Xi

  4. Stochastic Linear Models • E[Yi] = b0+b1Xi • Variation in Y is caused by more than X: error (ei) • So:

  5. ei ei=0 X Assumptions Necessary for Estimating Linear Models 1. Errors have identical distributions Zero mean, same variance, across the range of X 2. Errors are independent of X and other ei 3. Errors are normally distributed

  6. Y X Normal, Independent & Identical ei Distributions (“Normal iid”) Problem: We don’t know: a) if error assumptions hold true; b) values for b0 and b1 Solution: Estimate ‘em!

  7. OLS Derivation of b0 Use partial derivation in this step:

  8. Derivation of b0, step 2

  9. Derivation of b1Step 1: Multiply out e2

  10. Derivation of b1Step 2: Differentiate w.r.t. b1

  11. Derivation of b1Step 3: Substitute for b0

  12. Derivation of b1Step 4: Simplify and Isolate b1

  13. Calculating b0 and b1 • The formula for b1 and b0 allow you (or preferably your computer) to calculate the error-minimizing slope and intercept for any data set representing a bi-variate, linear relationship. • No other line, using the same data, will result in • a smaller a squared-error (e2 ). OLS gives best fit.

  14. Interpreting b1 and b0 For each 1-unit increase in X, you get b1 units change in Y When X is zero, Y will be equal to b0. Note that a regression model with no independent variables is simply the mean.

  15. Theoretical Specification of Multivariate Regression

  16. Regression in Matrix Form • Assume a model using n observations, with K-1 Xi (independent) variables

  17. Regression in Matrix Form Note: we can’t uniquely define (X’X)-1 if any column in the X matrix is a linear function of any other column(s) in X.

  18. The X’X Matrix Note that you can obtain the basis for all the necessary means, variances and covariances among the Xs from the (X’X) matrix

  19. An Example of Matrix Regression Using a sample of 7 observations, where X has Elements {X0, X1, X2, X3}

  20. Summary of OLS Assumption Failures and their Implications Problem Biased b Biased SE Invalid t/F Hi Var Non-linear Yes Yes Yes --- Omit relev. X Yes Yes Yes --- Irrel X No No No Yes X meas. Error Yes Yes Yes --- Heterosced. No Yes Yes Yes Autocorr. No Yes Yes Yes X corr. error Yes Yes Yes --- Non-normal err. No No Yes Yes Multicolinearity No No No Yes

  21. BREAK

  22. Number of Fire Trucks Number of Fire Deaths X2 Y Causality and Experiments Question: What is the relationship between the number of fire trucks at the scene of a fire, and the number of deaths caused by that fire? Experimental approach: Randomly assign fire incidents to different categories, which receive different numbers of trucks (treatment).

  23. Number of Fire Deaths Y X2 Number of Fire Trucks X1 Size of Fire Causality and Observational Data The problem of spurious relations... In an experimental design, we fully control for spurious relationships. With OLS we try to manage them statistically.

  24. Statistical Calculation of Partial Effects In calculating the effect of X1 on Y, we remove the effect of the other X’s on both X1 and Y: Y stripped of the effect of X2 X1 stripped of the effect of X2 The use of residuals “cleans” both Y and X1 of their correlations with X2, permitting estimation PRCs.

  25. Intuition of PRC’s • All overlapping variance is stripped • Highly correlated IVs are problematic • But what if the overlap is important? • What if X1 and X2 are really part of some larger construct? • The case of knowledge, efficacy and behavior • Kelstet et al • How should we interpret the PRC’s in this case?

  26. Workshop • Load EE data • Run a simple model: • Willingness to pay for an alternative energy tax • Use randomly assigned cost as IV • Plot to relationship (use jitter) • Now add: Income, Ideology • Change in cost variable? (Why?)

  27. Homework • Generate and analyze the residuals • Add to the model: • Belief in anthropogenic climate change • Will require recodes • Understanding of GCC science • Recode “What scientists’ believe…” variables • 1 page summary of findings for class next week • Next Extension: Modeling Dummies and Interactions

More Related