1 / 42

Lecture 8 Relationships between Scale variables: Regression Analysis

Lecture 8 Relationships between Scale variables: Regression Analysis. Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk. Notices:. Register. Plan:. 1. Linear & Non-linear Relationships 2. Fitting a line using OLS 3. Inference in Regression

kelvin
Download Presentation

Lecture 8 Relationships between Scale variables: Regression Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 8Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk

  2. Notices: • Register

  3. Plan: • 1. Linear & Non-linear Relationships • 2. Fitting a line using OLS • 3. Inference in Regression • 4. Ommitted Variables & R2 • 5. Types of Regression Analysis • 6. Properties of OLS Estimates • 7. Assumptions of OLS • 8. Doing Regression in SPSS

  4. 1. Linear & Non-linear relationships between variables • Often of greatest interest in social science is investigation into relationships between variables: • is social class related to political perspective? • is income related to education? • is worker alienation related to job monotony? • We are also interested in the direction of causation, but this is more difficult to prove empirically: • our empirical models are usually structured assuming a particular theory of causation

  5. Relationships between scale variables • The most straight forward way to investigate evidence for relationship is to look at scatter plots: • traditional to: • put the dependent variable (I.e. the “effect”) on the vertical axis • or “y axis” • put the explanatory variable (I.e. the “cause”) on the horizontal axis • or “x axis”

  6. Scatter plot of IQ and Income:

  7. We would like to find the line of best fit:

  8. What does the output mean?

  9. Sometimes the relationship appears non-linear:

  10. … and so a straight line of best fit is not always very satisfactory:

  11. Could try a quadratic line of best fit:

  12. But we can simulate a non-linear relationship by first transforming one of the variables:

  13. … or a cubic line of best fit:(overfitted?)

  14. Or could try two linear lines:“structural break”

  15. 2. Fitting a line using OLS • The most popular algorithm for drawing the line of best fit is one that minimises the sum of squared deviations from the line to each observation: Where: yi = observed value of y = predicted value of yi = the value on the line of best fit corresponding to xi

  16. Regression estimates of a, b:or Ordinary Least Squares (OLS): • This criterion yields estimates of the slope b and y-intercept a of the straight line:

  17. 3. Inference in Regression: Hypothesis tests on the slope coefficient: • Regressions are usually run on samples, so what can we say about the population relationship between x and y? • Repeated samples would yield a range of values for estimates of b ~ N(b, sb) • I.e. b is normally distributed with mean = b = population mean = value of b if regression run on population • If there is no relationship in the population between x and y, thenb = 0, & this is our H0

  18. What does the standard error mean?

  19. Hypothesis test on b: • (1) H0: b = 0 (I.e. slope coefficient, if regression run on population, would = 0) H1: b 0 • (2) a = 0.05 or 0.01 etc. • (3) Reject H0 iff P < a • (N.B. Rule of thumb: P < 0.05 if tc 2, and P < 0.01 if tc 2.6) • (4) Calculate P and conclude.

  20. Example using SPSS output: • (1) H0: no relationship between house price and floor area. H1: there is a relationship • (2), (3), (4): • P = 1- CDF.T(24.469,554) = 0.000000 Reject H0

  21. 4. Ommitted Variables & R2Q/ is floor area the only factor?How much of the variation in Price does it explain?

  22. R-square • R-square tells you how much of the variation in y is explained by the explanatory variable x • 0 < R2 < 1 (NB: you want R2 to be near 1). • If more than one explanatory variable, use Adjusted R2

  23. Example: 2 explanatory variables

  24. Scatter plot (with floor spikes)

  25. 3D Surface Plots:Construction, Price & UnemploymentQ = -246 + 27P - 0.2P2 - 73U + 3U2

  26. Construction Equation in a SlumpQ = 315 + 4P- 73U + 5U2

  27. 5. Types of regression analysis: • Univariate regression: one explanatory variable • what we’ve looked at so far in the above equations • Multivariate regression: >1 explanatory variable • more than one equation on the RHS • Log-linear regression & log-log regression: • taking logs of variables can deal with certain types of non-linearities & useful properties (e.g. elasticities) • Categorical dependent variable regression: • dependent variable is dichotomous -- observation has an attribute or not • e.g. MPPI take-up, unemployed or not etc.

  28. 6. Properties of OLS estimators • OLS estimates of the slope and intercept parameters have been shown to be BLUE (provided certain assumptions are met): • Best • Linear • Unbiased • Estimator

  29. “Best” in that they have the minimum variance compared with other estimators (i.e. given repeated samples, the OLS estimates for α and β vary less between samples than any other sample estimates for α and β). • “Linear” in that a straight line relationship is assumed. • “Unbiased” because, in repeated samples, the mean of all the estimates achieved will tend towards the population values for α and β. • “Estimates” in that the true values of α and β cannot be known, and so we are using statistical techniques to arrive at the best possible assessment of their values, given the information available.

  30. 7. Assumptions of OLS: For estimation of a and b to be BLUE and for regression inference to be correct: • 1. Equation is correctly specified: • Linear in parameters (can still transform variables) • Contains all relevant variables • Contains no irrelevant variables • Contains no variables with measurement errors • 2. Error Term has zero mean • 3. Error Term has constant variance

  31. 4. Error Term is not autocorrelated • I.e. correlated with error term from previous time periods • 5. Explanatory variables are fixed • observe normal distribution of y for repeated fixed values of x • 6. No linear relationship between RHS variables • I.e. no “multicolinearity”

  32. 8. Doing Regression analysis in SPSS • To run regression analysis in SPSS, click on Analyse, Regression, Linear:

  33. Select your dependent (i.e. ‘explained’) variable and independent (i.e. ‘explanatory’) variables:

  34. e.g. Floor area and bathrooms:Floor area = a + b Number of bathrooms + e

  35. Confidence Intervals for regression coefficients • Population slope coefficient CI: • Rule of thumb:

  36. e.g. regression of floor area on number of bathrooms, CI on slope: b = 64.6  2  3.8 = 64.6  7.6 95% CI = ( 57, 72)

  37. Confidence Intervals in SPSS:Analyse, Regression, Linear, click on Statistics and select Confidence intervals:

  38. Our rule of thumb said 95% CI for slope = ( 57, 72). How does this compare?

  39. Past Paper: (C2)Relationships (30%) • Suppose you have a theory that suggests that time watching TV is determined by gregariousness • the less gregarious, the more time spent watching TV • Use a random sample of 60 observations from the TV watching data to run a statistical test for this relationship that also controls for the effects of age and gender. • Carefully interpret the output from this model and discuss the statistical robustness of the results.

  40. Reading: • Regression Analysis: • *Field, A. chapters on regression. • *Moore and McCabe Chapters on regression. • Kennedy, P. ‘A Guide to Econometrics’ • Bryman, Alan, and Cramer, Duncan (1999) “Quantitative Data Analysis with SPSS for Windows: A Guide for Social Scientists”, Chapters 9 and 10. • Achen, Christopher H. Interpreting and Using Regression (London: Sage, 1982).

More Related