Multiple Regression Analysis: Estimation

Multiple Regression Analysis: Estimation

Multiple Regression Model y = ß0 + ß1x1 +ß2x2 + …+ ßkxk + u • ß0 is still theintercept • ß1 to ßk all called slope parameters • u is still the error term (or disturbance term) • Zero mean assumption • E(u) = 0 • Still minimize the sum of squared residuals

Multiple Regression Model: Example • Demand Estimation: • Dependent variable: Q, tile cases (in 1000 of cases) • Right-hand side variables: tile price per case (p), income per capita I (in 1000 of $), and advertising expenditure A (in 1000 $) Regression: Q = ß0 + ß1P + ß2I + ß3A + u • Interpretation: • ß1 measures the effects of the tile price on the tile consumption, holding all other factors fixed • ß2 represents the effects of income, holding all other factors fixed • ß3 represents the effects of advertising, holding all other factors fixed

Q = 17.513 – 0.296P + 0.066I + 0.036A • What is the impact of a price change on tile scales? • What is the impact of a change in income on tile scales? • What is the impact of a change in advertising expenditures on tile scales? • Calculation of own-price elasticity? • Calculation of income elasticity? • Calculation of advertising elasticity?

Random Sampling • Collecting sales data of 23 tile stores in 2002 in the market • For each observation, Qi = ß0 + ß1Pi + ß2Ii + ß3Ai + ui • Goal: Estimate ß0, ß1, ß2, ß3 Using OLS to estimate the coefficients to minimize the sum of squared errors.

The Generic Multiple Regression Model • Estimation of regression parameters: • Least Squares (no knowledge of the distribution of the error or disturbance terms is required). • The use of the matrix notation allows a view of how the data are housed in software programs.

Components of the Model • Endogenous Variables—dependent variables, values of which are determined within the system. • ExogenousVariables—determined outside the system but influence the system by affecting the values of the endogenous variables. • StructuralParameters—estimated using statistical techniques and relevant data. • Lagged Endogenous Variables • Lagged Exogenous Variables • Predetermined Variables

Captures: • Omission of the influence of other variables. • Measurement error. The Disturbance (or Error) Term Stochastic, a random variable. Statistical distribution often normal. Recognition that any regression model is a parsimonious stochastic representation of reality. Also recognition that any regression model is stochastic and not deterministic.

OLS Estimates Associated with the Multiple Regression Model

The Gauss-Markov Theorem Given the assumptions below, it can be shown that the OLS estimator is “BLUE.” - Best - Linear - Unbiased - Estimator Assumptions: - Linear in parameters - Corr (εi, εj) = 0 - Zero mean - No perfect collinearity - Homoscedasticity

Communication and Aims for the Analyst

Communication • A technician can run a program and get output. • An analyst must interpret the findings from examination of this output. • There are no bonus points to be given to terrific hackers but poor analysts. • Aims • Improve your ability in developing models to conduct structural analysis and to forecast with some accuracy. • Enhance your ability in interpreting and communicating the results, so as to improve your decision-making. • Bottom Line • The analyst transforms the economic model/idea to a mathematical/statistical one. • The technician estimates the model and obtains a mathematical/statistical answer. • The analyst transforms the mathematical/statistical answer to an economic one.

Goodness-of-Fit

Goodness-of-Fit (continued . . .) How well does our sample regression line fit our sample data? R-squared of regression is the fraction of the total sum of squares (SST) that is explained by the model. R² = SSR/SST = 1 – SSE/SST

More about R-Squared R² can never decrease when another explanatory or predetermined variable is added to a regression; usually R² will increase. Because R² will usually increase (or at least not decrease) with increases in the number of right-hand side or explanatory variables, it is not necessarily a good way to compare alternative models with the same dependent variable.

R² and Adjusted R² R² Adjusted R² • Questions: • Why do we care about the adjusted R² ? • Is adjusted R² always better than R² ? • What’s the relationship between R² and adjusted R² ?

Model Selection Criteria

Model Selection Criteria Example Which model to choose?

Estimate of Error Variance • df = n – (k + 1), or df = n – k – 1 • df (i.e. degrees of freedom) is the (number of observations) – (number of estimated parameters)

Variance of OLS Parameter Estimates

Example: SAS Output of the Demand Function for Shrimp Quantity sold of shrimp

Price of shrimp • Price of finfish • Price of other shellfish • Advertising for shrimp • Advertising for finfish • Advertising for other shellfish

Model Selection Criteria for the QSHRIMP Problem

Multiple Regression Analysis: Estimation