1 / 24

Linear Regression Models

Linear Regression Models. Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need an error term Measurement errors, wrong model, omitted variables, inherent randomness Linear models often misused.

Download Presentation

Linear Regression Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Regression Models • Powerful modeling technique • Tease out relationships between “independent” variables and 1 “dependent” variable • Models not perfect…need an error term • Measurement errors, wrong model, omitted variables, inherent randomness • Linear models often misused.

  2. Example: Lake Water Quality • Chlorophyll-a (C) widely used indicator – measure of eutrophication • Nitrogen (N) associated with eutrophication • Q: Golf Course Development. Nitrogen expected to . By how much will C increase/decrease? • How should we proceed?

  3. Plot C vs. N

  4. A “Better” Model • Explain (single) regression line (model?). • Neg. relationship suggests a problem. • Omitted variable: Phosphorus (P) • Want to tease out effect of N, P separately. • Write a Multiple Linear Regression Model: • Model designed to “tease out” effect of N and effect of P, separately, on C. • (**) Define and interpret variables, parameters.

  5. Estimation • Use data to estimate parameter values that give “best fit”: b0=-9.4, b1=0.3, b2=1.2 • Answer: A one unit increase in N, results in about a 1.2 unit increase in C. • Importance: Omitting phosphorus from model introduced significant bias!!!

  6. Question: US Gas Consumption • Gasoline consumption produces many negative byproducts. • Policy may be directed at increasing the price of gas to reduce consumption. • But what is effect of price change? • Question: What is the price elasticity of demand for gasoline in the U.S.?

  7. Some Gasoline Data

  8. Gas Data Cont’d • Gas consumption increases through time. But no info here about price. • Next plot shows (+) relationship between gas price and gas consumption. • Note opposite of demand curve. • Something is wrong here… • Just as in Eutrophication problem, may have omitted important variables. • May have other problems, too.

  9. The OLS “Estimator” • Estimator: A rule or strategy for using data to estimate an unknown parameter. Defined before the data are drawn. • Ordinary Least Squares (OLS) estimator finds value of parameter that minimizes sum of squared deviations (see C vs. N plot) • Several assumptions for OLS estimator to apply to a model

  10. Linear Model • The model must be linear • Linear in parameters, not in variables. • Difference between parameter, variable. • Examples:

  11. Transforming Models • Previous “Ricker” model is non-linear (in the parameter). • Sometimes, can transform model so linear. • When plot, graph is nonlinear. • Take log of both sides, giving:

  12. CLRM: Assumption 1 • Dependent variable (Y) is function of specific set of independent variables (X’s). • Linear in parameters • Additive error • Coefficients are constant but unknown • Violations called “specification errors”, e.g. • Wrong regressors (a.k.a. indep. vars; X’s) • Nonlinearity • Changing parameters (e.g. through time)

  13. CLRM: Assumption 2 • Disturbances (ei’s) are independently and identically distributed ~ (0,s2) • Typically we assume ei~ N(0,s2) • Mean = 0 • Constant variance, s2 (but unknown) • Errors uncorrelated with one another • Example of violations: • Measurement Bias (seep gas flux) • Heteroskedasticity (variance differs). • Autocorrelated Errors (disturbances correlated)

  14. CLRM: Assumption 3 • It is possible to repeat the sample with same independent variables. • If had same levels of explanatory vars, would it be possible to generate same value of Y? • Common Violations: • Errors in variables – measurement error in X. • Autoregression – when lagged dependent variable should be independent variable • Simultaneous Equations – several relationships act jointly.

  15. Properties of Estimators • Estimators have many properties. • “6” is an estimator, but not a very good one. • Two main properties we care about: • Unbiased: The expected distance of estimator from thing it is estimating is 0. • Efficient: Small variance (spread) • “6” is biased, but has a very small variance (zero). • OLS estimator is unbiased and has minimum variance of all unbiased estimators.

  16. Correlation vs. Causation • Now we know just enough to be dangerous! • Can estimate how any set of variables affects some other variable….Very Powerful. • Problem is: Correlation doesn’t imply Causation! …. Why Data Mining is bad. • Chicken production, Global CO2. • May be “spurious” (no underlying relationship) • Difficult to tease out statistically. • “Granger Causality”

  17. Violations & Consequences

  18. Guide to Model Specification • Start with theory to generate model • Check assumptions of CLRM • Collect and plot data • Estimate model, test restrictions • Possibly perform Box-Cox transform • Check R2, and “Adjusted R2” • Plot residuals – look for patterns • Seek explanations for patterns

  19. What’s a Residual? • General form of linear model: • Graphically on board.

  20. Residuals vs. Fit Normal Quantile Plot Residual Plots

  21. Back to Gasoline Consumption • Recall, interested in how gas consumption is affected by price increase (say $0.10/gal.) • Variables: • Gas consumption per capita (G) • Gas price (Pg) • Income (Y) • New car price (Pnc) • Used car price (Puc)

  22. 2 Alternative Specifications • Linear specification: • Log-log specification (often used with economic data) • One way to test specification is Box-Cox Transform (see 3 lectures back)

  23. Results of Linear Model • Parameter estimate, (p-value of t-test). • Low p-value: “statistically significant” • R2 measures goodness of fit of model. • Low p-value of F statistic means model has explanatory power.

  24. Answer to Question • A 1 unit increase in price leads to a .04 unit decrease in gas consumption. • Units are: G(1000 gallons), Pg($). • So, a $0.10 increase in gas price leads to, on average, a 4 gallon decrease in gas consumption…not much!

More Related