1 / 20

Fitting the Data

Fitting the Data. Lecture 2. Today’s Plan. Finishing off the examples from Lecture 1 Introducing different types of data Fitting the data One of the most important lectures of the course There will be a question on this on a midterm and the final! (Almost guaranteed!)

ray-byrd
Download Presentation

Fitting the Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fitting the Data Lecture 2

  2. Today’s Plan • Finishing off the examples from Lecture 1 • Introducing different types of data • Fitting the data • One of the most important lectures of the course • There will be a question on this on a midterm and the final! (Almost guaranteed!) • You can find this material in the Appendix 4.2

  3. Experimental vs Observational • Because of financial/practical/ethical concerns, experiments in economics are rare (SIME/DIME, Tennessee STAR). • Economists tend to use observational data - obtained from real world behavior. Collected using surveys/administrative records. • Observational data poses problem: how to estimate causal effects, no random assignment, data definitions not quite right (what economic theory might require). • Much of econometrics is devoted to estimation with problems encountered with observational data.

  4. Cross-Section Data • We have already seen 2 examples of cross-section data: • Wages and years of education • Voting polls in Florida • Cross section data sets provide information about individual/agent behavior at a moment in time • Current Population Survey is a cross-section survey that generates monthly detail about the US work force • Data on county/state/or even countries at a moment in time is also cross-section data.

  5. Time Series Data Sets (1) • Time series data sets provide information about individual/agent behavior over time • A time unit of observation (day, week, month, year) defines a time series • We hear about time series data everyday: • Nasdaq • Financial Times Stock Exchange Index (FTSE) • Dow Jones • Government data: GDP/Unemployment/Inflation

  6. Time Series Data Sets (2) • Composition of unit can change • FTSE gives information on the top 100 stocks each day, not necessarily the same 100 stocks every day • CPS: gives data from each month on the number of people who are unemployed. Not the same people (we hope!) from month to month. • Characteristics of time series data sets • set of observations over time • composition of unit can change • compositional changes are dealt with using weighting schemes (Lecture 3)

  7. Longitudinal Data Sets • Longitudinal data sets provide information on a particular group of individuals/agents over time. • For example: following Econ140, Fall 2002 over time. Alternatively, a set of firms over time. • Example we will use: Production functions (Cobb-Douglas) - following firms over time. • Book example: Traffic Deaths and Alcohol Taxes - following states over time.

  8. Ordinary Least Squares (OLS) • Learning how to calculate a straight line (Appendix 4.2) • Recall the scatter plot of earnings vs. years of education: there was a mess of data! • We can use Ordinary Least Squares (OLS) to fit a straight line through these data points • This line is called the least squares line or line of best fit • Why is it called: ‘least square line’? • Least squares line is the minimization of errors - the OLS regression line picks up the smallest distance between data points and the line

  9. Two Parts to OLS 1. Derive estimators for a (intercept) and b (slope coefficent) • this means using differential calculus! 2. Calculate values for a and b from data • this means mechanically using the derived formulas for a & b • How to calculate a regression line through a mass of data points that do not necessarily lie on a straight line? • Each data point (X,Y) has a value.

  10. OLS Line • We’ll call the regression line • this is an estimate of the true Y • The errors will be the difference between and Y • errors can be positive or negative • We can write the following general equations: Where i = 1 … n.

  11. OLS Line • A data set example is available at the course web site. It consists of five points. Using that output I can calculate the regression equation to be: • Keeping this equation in mind we can find estimates of a and b given our general formulas for Y and • We derive a and b from two different types of regression equations: a from b from

  12. OLS Line: Deriving a (1) • We can rewrite as ei=Yi - a • we could write objective function for a as: • Go back to the regression analysis example: notice that the sum of errors is zero! • Why? The positive and negative errors from the line of best fit always cancel out • For a minimum you need a first order condition (FOC) set to zero. • We need a FOC for OLS that is set to zero, not zero to start with!

  13. OLS Line: Deriving a (2) • We can’t just minimize the sum of the errors because • Instead, we have to minimize the sum of the errors squared (hence - least squares): where ei = Y - a

  14. OLS Line: Deriving a (3) • Differentiate with respect to a to find the formula for the OLS estimator a • Note that you set the first order condition to zero to find a minimum: -2Sei = 0 (don’t worry about the second order derivative - which will be positive). • Remember that ei = Y - a • Solve for a: a = SYi/n.

  15. OLS Line: Deriving b (1) Now consider the slope regression where • We use the same principles as before: Note: this condition only holds if there’s no correlation between X and the errors So: (keep in mind that this expression only holds for the regression of a zero intercept and non-zero slope)

  16. OLS Line: Collect a & b • We know a regression line with a non-zero intercept and a non-zero slope coefficient looks like: • We also know: • From the derivations of a and b we have the necessary first order conditions:

  17. OLS Line: Collect a & b (2) • Plug the new equation into the FOC from our derivation of a: • Plugging into the FOC from the derivation of b:

  18. Example • From the data set posted on the web • To calculate the regression line you need: • Solve for a & b given the formulas:

  19. Example (2)

  20. Wrap Up • Introduced three data types: cross-section, time series, and longitudinal • Using the OLS technique to derive formulas for an intercept and a slope coefficient • We estimated the regression lines • We found FOCs = 0 • Then we put everything together to estimate

More Related