1 / 40

Regression & Correlation

Regression & Correlation. Interval & Ratio Level Association. An Example. Explaining variation in % of state’s 2000 population receiving food stamps. Dependent Variable. State-to-state variation in % of state’s 2000 population receiving food stamps. Independent Variables.

luisa
Download Presentation

Regression & Correlation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression & Correlation Interval & Ratio Level Association

  2. An Example • Explaining variation in % of state’s 2000 population receiving food stamps

  3. Dependent Variable • State-to-state variation in % of state’s 2000 population receiving food stamps

  4. Independent Variables • Such population characteristics as • Income • Education • Measures of need, e.g. • Unemployment rate • % living below poverty line

  5. Independent Variables, cont. • Other • Teen pregnancies • % covered by health insurance

  6. Interval/Ratio Data • Carries a lot of information • It may be multiplied & divided • It may (in theory) assume an infinite number of values (by going out to the right of the decimal) • It is also called “continuous” data

  7. Interval/Ratio Data, Continuous Data • Can be found in surveys (income, years of education, etc.) • Is more commonly found in data sets containing aggregate or ecological data (data which summarizes large numbers of individual cases)

  8. Interval/Ratio Data: Analyzing It • Can collapse (recode) it into categories • Can use regression and correlation to analyze it directly

  9. Regression: Explaining & Predicting • Case scores on independent variable (X) and dependent variable (Y) can be plotted onto a graph, creating a scattergram, or a scatterplot • A line (the regression line) can then be drawn through the points on the scattergram, in order to summarize them

  10. Regression: Explaining & Predicting • A regression equation describes a regressionline • Simple regression equations have the form: • Y’ = a + bXi

  11. Y’ = a + bXi • Y’ • A predicted value of the dependent variable • Xi • A given value of the independent variable • b • The slope of the regression line • The angle at which the regression line crosses the Y axis • a.k.a. the regression coefficient

  12. Y’ = a + bXi • a • The “Y intercept” • The point at which the regression line crosses the Y axis • The value of Y when X is zero

  13. Regression: Explaining & Predicting, cont. • The line which produces the least amount of error in predicting the dependent variable is the best line (the least squares criterion) • The computing formulas used to obtain slopes and intercepts are designed to satisfy this criterion

  14. Regression: Explaining & Predicting, cont. • They allow us to predict values of the dependent variable from given values of the independent variable • They show how the two variables are related (i.e. they explain the dependent variable’s behavior in terms of the independent variable)

  15. Example: Food Stamps & Teenage Mothers • Dependent variable (Y): • % of state’s 2000 population receiving food stamps • Independent variable (X): • % of births to mothers under 20 in 1997 • Equation: • Food stamps % = 1.238 + .396(% of births to mothers under 20)

  16. Food Stamps & Teenage Mothers, cont. • If 15% of a state’s births are to mothers under the age of 20, what percentage of that state’s population would you predict would be receiving Food Stamps? • Food Stamps % = 1.238 + . 396(15%) • Food Stamps % = 1.238 + 5.94 • Food Stamps % = 7.178%

  17. Food Stamps & Teenage Mothers, cont. • If the number of births to mothers under the age of 20 in that state were to decline by 3%, what effect might that have on percentage of population receiving Food Stamps? • Food Stamps % = 1.238 - . 396(3%) • Food Stamps % = 1.238 – 1.888 • Food Stamps % = Decrease by 0.05%

  18. Food Stamps & Teenage Mothers, cont. • Food stamps % = 1.238 + . 396(% of births to mothers under 20) • Food Stamps % and births to mothers under 20 are positively associated. As % of births to mothers under 20 decreases, percent of population receiving food stamps also decreases (the positive slope tells us that)

  19. Explaining Food Stamps & Teenage Mothers • How much percent of population receiving food stamps decreases is indicated by the slope’s size (magnitude) A one percent change in births to mothers under 20 results in a change of (roughly) . 396% in percent of population receiving aid.

  20. Slopes • Are Key, But • Their magnitude is affected by both the strength of association between the two variables, and by the magnitude of the independent variable • They are not standardized • Two slopes may not be easily compared

  21. Slopes, but • Sometimes we are interested in measuring strength of association, not in explaining &/or predicting • To deal with this, we use the correlation coefficient

  22. Correlation • Is a summary association measure for interval/ratio data (used like Cramer’s V, Somer’s D, etc.) • Is a standardized slope • Is easily calculated • Is routinely reported with regression equations

  23. Correlation • Lots Of Names, One Statistic • Pearson’s r • Correlation coefficient • Pearson’s Product Moment Correlation Coefficient

  24. Correlation, Cont. • Is often reported by itself, without bothering to first calculate slopes & intercepts • Ranges from -1.0 to 0.0 to 1.0 • When squared (the coefficient of determination), shows the amount (%) of variation explained

  25. Correlation r2 shows the amount (%) of explained variation: r r2 .30 .09 .50 .25 .608 .37

  26. Getting Correlations Without Scattergrams • There is a correlation function in many statistical software packages, and some spreadsheets • They will produce a correlation matrix, which shows the correlation of each selected variable with all other selected variables

  27. Standard Error of Estimate • A “goodness of fit” measure • Analogous to standard deviation • a range above & below regression line within which 68.2% of all actual cases fall

  28. Multiple Regression & Partial Correlation • Multivariate analysis for interval & ratio level data • Involves the introduction of additional independent variables (controls) into a bivariate association • Yields summary statistics that are comparable to those found in simple regression

  29. Multiple Regression: Results • Multiple regression equation • Y’ = a + b1X1 + b2X2 + + bnXn • Each slope indicates the relationship between its corresponding independent variable and the dependent variable independent of the effect of all other independent variables in the equation

  30. Multiple Regression Equation • Size of slopes is affected by • Strength of association • Scale of independent variable(s) • Number of independent variables in the equation

  31. Multiple Regression: Results, cont. • Multiple correlation coefficient: R2 • Shows the % of variation in dependent variable explained by all independent variables acting together • Significance

  32. Example: Food Stamps • Criteria For Assessing Obtained Equation(s) • Do a good job of explaining variation in dependent variable (i.e. maximize R2) • Keep number of independent variables down to a reasonable minimum, a.k.a. • Parsimony • Elegance • Efficiency

  33. Example: Food Stamps • Selecting Independent Variables • Start with a set of interesting variables, then winnow down • Considerations: • Variables that are (large correlation coefficients) or should be (in theory) strongly associated with the dependent variable are good starting points

  34. Example: Food Stamps • Selecting Independent Variables, cont. • Avoid using several independent variables which measure the same concept (strongly correlated with each other, have important theoretic similarities) • Try to use independent variables which make significant contributions to the final equation • “t” of 2.0 or greater indicates significance • Remember, this will change as you add or delete variables

  35. Selecting Independent Variables, cont. • A beta (a standardized slope) • Shows the influence of its associated independent variable on the dependent variable, independent of the effects of all other independent variables in the equation • Is expressed in standard deviation units • Can drop independent variables with small betas (or add ones with large betas), then recompute. This is a form of stepwise regression

  36. Resulting Equation % Food Stamps = 16.7 + .343(Teen Moms) - .157(% HS) - .103(Health Insurance) R2 = .443 Prob. = .000

  37. Multiple Correlation Coefficient • R2 • Shows the % of variation in dependent variable explained by all independent variables acting together

  38. Partial Correlation Coefficient • rxy.z • Shows correlation between dependent variable & a single independent variable, controlling for the effect of a third (fourth, etc.) variable

  39. Interpreting Partial Correlation rxy.z 2 shows the amount (%) of variation explained by independent variable, independent of the controls: rxy.z rxy.z 2 .30 .09 .50 .25 .185 .43

More Related