270 likes | 389 Views
PSC 5940: Regression Review and Questions about “Causality”. Session 2 Fall, 2009. Data Discussion. EE09 & NS09 Data: research ideas? Fixing data in Excel: EE09 NA replacement Text to numeric (e28_gcc) Getting rid of extraneous characters $ in “random_p” EE and partisanship
E N D
PSC 5940: Regression Review and Questions about “Causality” Session 2 Fall, 2009
Data Discussion • EE09 & NS09 Data: research ideas? • Fixing data in Excel: EE09 • NA replacement • Text to numeric (e28_gcc) • Getting rid of extraneous characters • $ in “random_p” • EE and partisanship • Loading and attaching the data • Examining party identification (“e216_par”) • Examining gender (“e3_gender”) • Dealing with awkward names and NA values
a b Deterministic Linear Models • Theoretical Model: • b0andb1are constant terms • b0 is the intercept • b1 is the slope • Xi is a predictor of Yi Yi b0 Xi
Stochastic Linear Models • E[Yi] = b0+b1Xi • Variation in Y is caused by more than X: error (ei) • So:
ei ei=0 X Assumptions Necessary for Estimating Linear Models 1. Errors have identical distributions Zero mean, same variance, across the range of X 2. Errors are independent of X and other ei 3. Errors are normally distributed
Y X Normal, Independent & Identical ei Distributions (“Normal iid”) Problem: We don’t know: a) if error assumptions hold true; b) values for b0 and b1 Solution: Estimate ‘em!
OLS Derivation of b0 Use partial derivation in this step:
Calculating b0 and b1 • The formula for b1 and b0 allow you (or preferably your computer) to calculate the error-minimizing slope and intercept for any data set representing a bi-variate, linear relationship. • No other line, using the same data, will result in • a smaller a squared-error (e2 ). OLS gives best fit.
Interpreting b1 and b0 For each 1-unit increase in X, you get b1 units change in Y When X is zero, Y will be equal to b0. Note that a regression model with no independent variables is simply the mean.
Regression in Matrix Form • Assume a model using n observations, with K-1 Xi (independent) variables
Regression in Matrix Form Note: we can’t uniquely define (X’X)-1 if any column in the X matrix is a linear function of any other column(s) in X.
The X’X Matrix Note that you can obtain the basis for all the necessary means, variances and covariances among the Xs from the (X’X) matrix
An Example of Matrix Regression Using a sample of 7 observations, where X has Elements {X0, X1, X2, X3}
Summary of OLS Assumption Failures and their Implications Problem Biased b Biased SE Invalid t/F Hi Var Non-linear Yes Yes Yes --- Omit relev. X Yes Yes Yes --- Irrel X No No No Yes X meas. Error Yes Yes Yes --- Heterosced. No Yes Yes Yes Autocorr. No Yes Yes Yes X corr. error Yes Yes Yes --- Non-normal err. No No Yes Yes Multicolinearity No No No Yes
Number of Fire Trucks Number of Fire Deaths X2 Y Causality and Experiments Question: What is the relationship between the number of fire trucks at the scene of a fire, and the number of deaths caused by that fire? Experimental approach: Randomly assign fire incidents to different categories, which receive different numbers of trucks (treatment).
Number of Fire Deaths Y X2 Number of Fire Trucks X1 Size of Fire Causality and Observational Data The problem of spurious relations... In an experimental design, we fully control for spurious relationships. With OLS we try to manage them statistically.
Statistical Calculation of Partial Effects In calculating the effect of X1 on Y, we remove the effect of the other X’s on both X1 and Y: Y stripped of the effect of X2 X1 stripped of the effect of X2 The use of residuals “cleans” both Y and X1 of their correlations with X2, permitting estimation PRCs.
Intuition of PRC’s • All overlapping variance is stripped • Highly correlated IVs are problematic • But what if the overlap is important? • What if X1 and X2 are really part of some larger construct? • The case of knowledge, efficacy and behavior • Kelstet et al • How should we interpret the PRC’s in this case?
Workshop • Load EE data • Run a simple model: • Willingness to pay for an alternative energy tax • Use randomly assigned cost as IV • Plot to relationship (use jitter) • Now add: Income, Ideology • Change in cost variable? (Why?)
Homework • Generate and analyze the residuals • Add to the model: • Belief in anthropogenic climate change • Will require recodes • Understanding of GCC science • Recode “What scientists’ believe…” variables • 1 page summary of findings for class next week • Next Extension: Modeling Dummies and Interactions