Linear Regression

Linear Regression • A method of calculating a linear equation for the relationship between two or more variables using multiple data points.

Linear Regression • General form: Y = a + bX + e • “Y” is the dependent variable • “X” is the explanatory variable • “a” is the intercept parameter • “b” is the slope parameter • “e” is an error term or residual

Regression Results • Y and X come from data • A computer program calculates estimates of a and b • e is the difference between a + bX and the actual value of Y corresponding to X • OLS estimates of a and b minimize the sum of the squared residuals ∑e2 • “OLS” is Ordinary Least Squares

The Regression Line and the Residual

Electricity Demand Example • Data for residential customers in U. S. states • Y is millions of kilo-watt-hours sold • X is population • Other data include per capita income, price of electricity (cents/kwh) and price of natural gas

Variable Means 2004

Variable Means 2012

Excel Regression: Milkwh

Actual vs. Predicted Milkwh

Useful Statistics and Tests • t-statistic: is estimated coefficient significantly different from zero? • Coefficient of determination or R-square: % variation explained • F-statistic: statistical significance of the entire regression equation; OR is the R-square significantly different from zero? • Find these on the Excel regression output.

Confidence and Significance Levels • 99% Confidence = 1% Significance • P-value of 0.01 or less • 95% Confidence = 5% Significance • P-value of 0.05 or less • 90% Confidence = 10% Significance • P-value of 0.10 or less • Smaller Significance Levels Are Better • Find P-values for t and F statistics on Excel regression output

Multiple & Nonlinear Regression • Multiple Regression • Y= a + bX + cW + dZ • Nonlinear Regression • Quadratic: Y = a + bX + cX2 • Log-Linear: Y = aXbZc • Or Ln Y = (ln a) + b(ln X) + c(ln Z)

Multiple Regression: Milkwh

Quadratic Regression: Milkwh

Log Linear Regression: LnMilkwh

Demand Regression Project DATA: ElecDemandData2012.xls under Project Materials on D2L 1. Using the data file above, run a linear regression of Dependent Variable: Milkwh Explanatory Variables: Pop, Pkwh, PGas, Income. Which coefficients (including the constant) are statistically significant at the 10% level or better? Which are not significant? How much of the variation in the dependent variable is explained by the estimated equation? Is the equation as a whole statistically significant? At what level?

Finding the Marginal Revenue Equation: Overview • Evaluate estimated demand at means of all explanatory variables except price • Calculate average effect of non-price variables to get demand equation in this form • Q = A - b(P) • Rearrange to find Inverse Demand equation • P = (A/b) - (1/b)Q • MR has twice the slope of inverse demand • MR = (A/b) – (2/b)Q • The end result is an equation, not a number

Finding the Marginal Revenue Equation: Example • Write your regression equation in this form • Milkwh = 11,000 – 3600Pkwh + 0.0041Pop + 2150PGas • 11,000 is the intercept or constant coefficient • 3600, 0.0041, and 2150 are estimated coefficients • These are made-up numbers for this example • Use the mean values of the non-electricity-price variables • Pop=5,756,577 PGas=11.4 • Substitute into your regression equation and simplify • Milkwh = 11,000 – 3600Pkwh + 0.0041(5,756,577) + 2150(11.4) • Milkwh= [11000 + 23,601.97 + 24,510] – 3600(Pkwh) • Milkwh = 59,111.97 – 3600(Pkwh) • This is Q = A – bP from the previous slide

Finding Marginal Revenue Example, Continued • Milkwh = 59,111.97 – 3600(Pkwh) • From end of previous slide • Rearrange to find Inverse Demand • P = (A/b) - (1/b)Q = (A – Q)/b • Pkwh = (59,111.97 – Milkwh)/(3600) • Pkwh = 16.42 – 0.00028(Milkwh) • This is the inverse demand equation • Marginal Revenue has twice the slope • MR = 16.42 – 0.00056(Milkwh)

Linear Regression