1 / 39

Class 4 Ordinary Least Squares

CERAM February-March-April 2008. Class 4 Ordinary Least Squares. Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr. Introduction to Regression.

adina
Download Presentation

Class 4 Ordinary Least Squares

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CERAM February-March-April 2008 Class 4Ordinary Least Squares Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

  2. Introduction to Regression • Ideally, the social scientist is interested not only in knowing the intensity of a relationship, but also in quantifying the magnitude of a variation of one variable associated with the variation of one unit of another variable. • Regression analysis is a technique that examines the relation of a dependent variable to independentor explanatory variables. • Simple regression y = f(X) • Multiple regression y = f(X,Z) • Let us start with simple regressions

  3. Scatter Plot of Fertilizer and Production

  4. Scatter Plot of Fertilizer and Production

  5. Scatter Plot of Fertilizer and Production

  6. Scatter Plot of Fertilizer and Production

  7. Scatter Plot of Fertilizer and Production

  8. Objective of Regression • It is time to ask: “What is a good fit?” • “A good fit is what makes the error small” • “The best fit is what makes the error smallest” • Three candidates • To minimize the sum of all errors • To minimize the sum of absolute values of errors • To minimize the sum of squared errors

  9. Problem of sign Y + – + X To minimize the sum of all errors Y – + – X

  10. Problem of middle point Y +3 X To minimize the sum of absolute values of errors Y –1 +2 –1 X

  11. Solve both problems To minimize the sum of squared errors Y – + – X

  12. ε² ε To minimize the sum of squared errors • Overcomes the sign problem • Goes through the middle point • Squaring emphasizes large errors • Easily Manageable • Has a unique minimum • Has a unique – and best - solution

  13. Scatter Plot of Fertilizer and Production

  14. Scatter Plot of R&D and Patents (log)

  15. Scatter Plot of R&D and Patents (log)

  16. Scatter Plot of R&D and Patents (log)

  17. Scatter Plot of R&D and Patents (log)

  18. The Simple Regression Model • yi Dependent variable (to be explained) • xi Independent variable (explanatory) α First parameter of interest • Second parameter of interest • εiError term

  19. The Simple Regression Model

  20. ε² ε To minimize the sum of squared errors

  21. ε² ε To minimize the sum of squared errors

  22. Application to CERAM_BIO Data using Excel

  23. Application to CERAM_BIO Data using Excel

  24. Interpretation • When the log of R&D (per asset)increases by one unit, the log of patent per asset increases by 1.748 • Remember! A change in log of x is a relative change of x itself • A 1% increase in R&D (per asset) entails a 1.748% increase in the number of patent (per asset).

  25. Application to Data using SPSS • Analyse  Régression  Linéaire

  26. Assessing the Goodness of Fit • It is important to ask whether a specification provides a good prediction on the dependent variable, given values of the independent variable. • Ideally, we want an indicator of the proportion of variance of the dependent variable that is accounted for – or explained – by the statistical model. • This is the variance of predictions (ŷ) and the variance of residuals (ε), since by construction, both sum to overall variance of the dependent variable (y).

  27. Overall Variance

  28. Decomposing the overall variance (1)

  29. Decomposing the overall variance (2)

  30. Coefficient of determination R² • R2 is a statistic which provides information on the goodness of fit of the model.

  31. Fisher’s F Statistics • Fisher’s statistics is relevant as a form of ANOVA on SSfit which tells us whether the regression model brings significant (in a statistical sense, information. p: number of parameters N: number of observations

  32. Application to Data using SPSS • Analyse  Régression  Linéaire

  33. What the R² is not • Independent variables are a true cause of the changes in the dependent variable • The correct regression was used • The most appropriate set of independent variables has been chosen • There is co-linearity present in the data • The model could be improved by using transformed versions of the existing set of independent variables

  34. Inference on β • We have estimated • Therefore we must test whether the estimated parameter is significantly different than 0, and, by way of consequence, we must say something on the distribution – the mean and variance – of the true but unobserved β*

  35. The mean and variance of β • It is possible to show that is a good approximation, i.e. an unbiased estimator, of the true parameter β*. • The variance of β is defined as the ratio of the mean square of errors over the sum of squares of the explanatory variable

  36. The confidence interval of β • We must now define de confidence interval of β, at 95%. To do so, we use the mean and variance of β and define the t value as follows: • Therefore, the 95% confidence interval of β is: If the 95% CI does not include 0, then βis significantly different than 0.

  37. Student t Test forβ • We are also in the position to infer on β • H0: β* = 0 • H1: β* ≠ 0 Rule of decision Accept H0 is | t | < tα/2 Reject H0 is | t | ≥tα/2

  38. Application to Data using SPPS • Analyse  Régression  Linéaire

  39. Assignments on CERAM_BIO • Regress the number of patent on R&D expenses and consider: • The quality of the fit • The significance and direction of R&D expenses • The interpretation of the result in an economic sense • Repeat steps 1 to 3 using: • R&D expenses divided by one million (you need to generate a new variable for that) • The log of R&D expenses • What do you observe? Why?

More Related