1 / 28

Multiple Independent Variables

Multiple Independent Variables. POLS 300 Butz. Multivariate Analysis. Problem with bivariate analysis in nonexperimental designs: Spuriousness and Causality Need for techniques that allow the research to control for other independent variables. Multivariate Analysis.

Download Presentation

Multiple Independent Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Independent Variables POLS 300 Butz

  2. Multivariate Analysis • Problem with bivariate analysis in nonexperimental designs: • Spuriousness and Causality • Need for techniques that allow the research to control for other independent variables

  3. Multivariate Analysis • Employed to see how large sets of variables are interrelated. • Idea is that if one can find a relationship between x and y after accounting for other variables (w and z) we may be able to make “causal inference”.

  4. Multivariate Analysis • We know that both X and Y both may be caused by Z, spurious relationship. • Multivariate Analysis allows for the inclusions of other variables and to test if there is still a relationship between X and Y.

  5. Multivariate Analysis • Must ask if the possibility of a third variable (and maybe others) is the “true” cause of both the IV and DV • Experimental analyses “prove” causation but only in Laboratory Setting…must use Multivariate Statistical Analyses in “real-world” • Need to “Control” or “hold constant” other variables to isolate the effect of IV on DV!

  6. Controlling for Other Independent Variables • Multivariate Crosstabulation – evaluate bivariate relationship within subsets of sample defined by different categories of third variable (“control by grouping”) • At what level(s) of measurement would we use Multivariate Crosstabulation??

  7. Multivariate Crosstabulation • Control by grouping: group the observations according to their values on the third variable and… • then observe the original relationship within each of these groups. • P. 407/506 – Spending Attitudes and Voting…controlling for Income! – spurious • Occupational Status and Voter Turnout • P. 411/510…control for “education”!

  8. Quick Review: Regression • In general, the goal of linear regression is to find the line that best predicts Y from X. • Linear regression does this by estimating a line that minimizes the sum of the squared errors from the line • Minimizing the vertical distances of the data points from the line.

  9. Regression vs. Correlation • The purpose of regression analysis is to determine exactly what the line is (i.e. to estimate the equation for the line) • The regression line represents predicted values of Y based on the values of X

  10. Equation for a Line (Perfect Linear Relationship) Yi = a + BXi a = Intercept, or Constant = The value of Y when X = 0 B = Slope coefficient = The change (+ or ‑) in Y given a one unit change in X

  11. Slope • Yi = a + BXi • B = Slope coefficient • If B is positive than you have a positive relationship. If it is negative you have a negative relationship. • The larger the value of B the more steep the slope of the line…Greater (more dramatic) change in Y for a unit change in X • General Interpretation: For one unit change in X, we expect a B change in Y on average.

  12. Calculating the Regression Equation For “Threat Hypothesis” • The estimated regression equation is: E(welfare benefit1995) = 422.7879 + [(-6.292) * %black(1995)] Number of obs = 50 F( 1, 64) = 76.651 Prob < = 0.001 R-squared = 0.3361 ------------------------------------------------------------------------------ welfare1995 | Coef. Std. Err. t P< [95% Conf. Interval] ---------+------------------------------------------------------------------- Black1995(b)| -6.29211 .771099 -8.1620.001 -8.1173 -4.0746 _cons(a)| 422.7879 12.63348 25.5510.001 317.90407 336.6615 ------------------------------------------------------------------------------

  13. Regression Example: “Threat Hypothesis” • To generate a predicted value for various % of AA in 1995, we could simply plug in the appropriate X values and solve for Y. 10% E(welfare benefit1995) = 422.7879 + [(-6.292) * 10] = $359.87 20% E(welfare benefit1995) = 422.7879 + [(-6.292) * 20] = $296.99 30% E(welfare benefit1995) = 422.7879 + [(-6.292) * 30] = $234.09

  14. Regression Analysis and Statistical Significance • Testing for statistical significance for the slope • The p-value - probability of observing a sample slope value (Beta Coefficent) at least as large as the one we are observing in our sample IF THE NULL HYPOTHESIS IS TRUE • P-values closer to 0 suggest the null hypothesis is less likely to be true (P < .05 usually the threshold for statistical significance) • Based on t-value…(Beta/S.E.) = t

  15. Multiple Regression • At what level(s) of measurement would we employ multiple regression??? • Interval and Ratio DVs • Now working with a new model: • Yi = a + b1X1i + b2X2i + ... + bkXki + ei

  16. Multiple Regression • Yi = a + b1X1i + b2X2i + ... + bkXki + ei • b are “Partial” slope coefficients. • a is the Y-Intercept. • e is the Error Term.

  17. Slope Coefficients • Slope coefficients are now Partial Slope Coefficients, although we still refer to them generally as slope coefficients. They have a new interpretation: • “The expected change in Y given a one‑unit change in X1, holding all other variables constant”

  18. Multiple Regression • By “holding constant” all other X’s, we are therefore “controlling for” all other X’s, and thus isolating the “independent effect” of the variable of interest. • “Holding Constant” – group observations according to levels of X2, X3, ect…then look at impact of X1 on Y! • This is what Multiple Regression is doing in practice!!! • Make everyone “equal” in terms of “control” variable then examine the impact of X1 on Y!

  19. “Holding Constant” other IVs • Income (Y) = Education (X1); Seniority (X2) • Look at relationship between Seniority and Income WITHIN different levels of education!!! “Holding Education Constant” • Look at relationship between Education and Income WITHIN different levels of Senority!!! “Holding Seniority Constant”

  20. The Intercept Yi = a + b1X1i + b2X2i + ... + bkXki + ei Y-Intercept (Constant) value…(a)…is now the expected value of Y when ALL the Independent Variables are set to 0.

  21. Testing for Statistical Significance • Proceeds as before – a probability that the null hypothesis holds (p-value) is generated for each sample slope coefficient • Based on “t-value” (Beta/ S.E.) • And Degrees of Freedom!

  22. Fit of the Regression • R-squared value – the proportion of variation in the dependent variable explained by ALL of the independent variables combined • TSS – ResSS/ TSS… “Explained Variation in DV divided by Total Variation in DV”

  23. R-square • R-square ranges from 0 to 1. • 0 is no relationship. • 1 is a prefect relationship…IVs explain 100% of the variance in the DV.

  24. R-square • Doesn’t tell us WHY the dependent variable varies or explains the results….This is why we need Theory!!! • Simply a measure of how well your model fits the dependent variable. • How well are the Xs predicting Y! • How much variation in Y is explained by Xs!

  25. Multiple Regression • Y= Income in dollars • X1= Education in years • X2= Seniority in years • Y= a + b1(education)+ b2(Seniority)+ e

  26. Example • Y= 5666 + 432X1 + 281X2 + e - Both Coefficients are statistically significant at the P < .05 Level… • Because of the positive Beta…expected change in Income (Y) given a one‑unit increase in Education is +$432, holding seniority in years constant.

  27. Predicted Values • Lets predict someone with 10 years of education and 5 years of seniority. • Y= 5666+432X1+281X2+e • = 5666+432(10)+281(5) • = 5666+ 4320+1405 • Predicted value of Y for this case is $11,391.

  28. R-squared • r-squared for this model is .56. • Education and Seniority explain 56% of the variation in income.

More Related