280 likes | 407 Views
Multiple Independent Variables. POLS 300 Butz. Multivariate Analysis. Problem with bivariate analysis in nonexperimental designs: Spuriousness and Causality Need for techniques that allow the research to control for other independent variables. Multivariate Analysis.
E N D
Multiple Independent Variables POLS 300 Butz
Multivariate Analysis • Problem with bivariate analysis in nonexperimental designs: • Spuriousness and Causality • Need for techniques that allow the research to control for other independent variables
Multivariate Analysis • Employed to see how large sets of variables are interrelated. • Idea is that if one can find a relationship between x and y after accounting for other variables (w and z) we may be able to make “causal inference”.
Multivariate Analysis • We know that both X and Y both may be caused by Z, spurious relationship. • Multivariate Analysis allows for the inclusions of other variables and to test if there is still a relationship between X and Y.
Multivariate Analysis • Must ask if the possibility of a third variable (and maybe others) is the “true” cause of both the IV and DV • Experimental analyses “prove” causation but only in Laboratory Setting…must use Multivariate Statistical Analyses in “real-world” • Need to “Control” or “hold constant” other variables to isolate the effect of IV on DV!
Controlling for Other Independent Variables • Multivariate Crosstabulation – evaluate bivariate relationship within subsets of sample defined by different categories of third variable (“control by grouping”) • At what level(s) of measurement would we use Multivariate Crosstabulation??
Multivariate Crosstabulation • Control by grouping: group the observations according to their values on the third variable and… • then observe the original relationship within each of these groups. • P. 407/506 – Spending Attitudes and Voting…controlling for Income! – spurious • Occupational Status and Voter Turnout • P. 411/510…control for “education”!
Quick Review: Regression • In general, the goal of linear regression is to find the line that best predicts Y from X. • Linear regression does this by estimating a line that minimizes the sum of the squared errors from the line • Minimizing the vertical distances of the data points from the line.
Regression vs. Correlation • The purpose of regression analysis is to determine exactly what the line is (i.e. to estimate the equation for the line) • The regression line represents predicted values of Y based on the values of X
Equation for a Line (Perfect Linear Relationship) Yi = a + BXi a = Intercept, or Constant = The value of Y when X = 0 B = Slope coefficient = The change (+ or ‑) in Y given a one unit change in X
Slope • Yi = a + BXi • B = Slope coefficient • If B is positive than you have a positive relationship. If it is negative you have a negative relationship. • The larger the value of B the more steep the slope of the line…Greater (more dramatic) change in Y for a unit change in X • General Interpretation: For one unit change in X, we expect a B change in Y on average.
Calculating the Regression Equation For “Threat Hypothesis” • The estimated regression equation is: E(welfare benefit1995) = 422.7879 + [(-6.292) * %black(1995)] Number of obs = 50 F( 1, 64) = 76.651 Prob < = 0.001 R-squared = 0.3361 ------------------------------------------------------------------------------ welfare1995 | Coef. Std. Err. t P< [95% Conf. Interval] ---------+------------------------------------------------------------------- Black1995(b)| -6.29211 .771099 -8.1620.001 -8.1173 -4.0746 _cons(a)| 422.7879 12.63348 25.5510.001 317.90407 336.6615 ------------------------------------------------------------------------------
Regression Example: “Threat Hypothesis” • To generate a predicted value for various % of AA in 1995, we could simply plug in the appropriate X values and solve for Y. 10% E(welfare benefit1995) = 422.7879 + [(-6.292) * 10] = $359.87 20% E(welfare benefit1995) = 422.7879 + [(-6.292) * 20] = $296.99 30% E(welfare benefit1995) = 422.7879 + [(-6.292) * 30] = $234.09
Regression Analysis and Statistical Significance • Testing for statistical significance for the slope • The p-value - probability of observing a sample slope value (Beta Coefficent) at least as large as the one we are observing in our sample IF THE NULL HYPOTHESIS IS TRUE • P-values closer to 0 suggest the null hypothesis is less likely to be true (P < .05 usually the threshold for statistical significance) • Based on t-value…(Beta/S.E.) = t
Multiple Regression • At what level(s) of measurement would we employ multiple regression??? • Interval and Ratio DVs • Now working with a new model: • Yi = a + b1X1i + b2X2i + ... + bkXki + ei
Multiple Regression • Yi = a + b1X1i + b2X2i + ... + bkXki + ei • b are “Partial” slope coefficients. • a is the Y-Intercept. • e is the Error Term.
Slope Coefficients • Slope coefficients are now Partial Slope Coefficients, although we still refer to them generally as slope coefficients. They have a new interpretation: • “The expected change in Y given a one‑unit change in X1, holding all other variables constant”
Multiple Regression • By “holding constant” all other X’s, we are therefore “controlling for” all other X’s, and thus isolating the “independent effect” of the variable of interest. • “Holding Constant” – group observations according to levels of X2, X3, ect…then look at impact of X1 on Y! • This is what Multiple Regression is doing in practice!!! • Make everyone “equal” in terms of “control” variable then examine the impact of X1 on Y!
“Holding Constant” other IVs • Income (Y) = Education (X1); Seniority (X2) • Look at relationship between Seniority and Income WITHIN different levels of education!!! “Holding Education Constant” • Look at relationship between Education and Income WITHIN different levels of Senority!!! “Holding Seniority Constant”
The Intercept Yi = a + b1X1i + b2X2i + ... + bkXki + ei Y-Intercept (Constant) value…(a)…is now the expected value of Y when ALL the Independent Variables are set to 0.
Testing for Statistical Significance • Proceeds as before – a probability that the null hypothesis holds (p-value) is generated for each sample slope coefficient • Based on “t-value” (Beta/ S.E.) • And Degrees of Freedom!
Fit of the Regression • R-squared value – the proportion of variation in the dependent variable explained by ALL of the independent variables combined • TSS – ResSS/ TSS… “Explained Variation in DV divided by Total Variation in DV”
R-square • R-square ranges from 0 to 1. • 0 is no relationship. • 1 is a prefect relationship…IVs explain 100% of the variance in the DV.
R-square • Doesn’t tell us WHY the dependent variable varies or explains the results….This is why we need Theory!!! • Simply a measure of how well your model fits the dependent variable. • How well are the Xs predicting Y! • How much variation in Y is explained by Xs!
Multiple Regression • Y= Income in dollars • X1= Education in years • X2= Seniority in years • Y= a + b1(education)+ b2(Seniority)+ e
Example • Y= 5666 + 432X1 + 281X2 + e - Both Coefficients are statistically significant at the P < .05 Level… • Because of the positive Beta…expected change in Income (Y) given a one‑unit increase in Education is +$432, holding seniority in years constant.
Predicted Values • Lets predict someone with 10 years of education and 5 years of seniority. • Y= 5666+432X1+281X2+e • = 5666+432(10)+281(5) • = 5666+ 4320+1405 • Predicted value of Y for this case is $11,391.
R-squared • r-squared for this model is .56. • Education and Seniority explain 56% of the variation in income.