190 likes | 194 Views
Multiple Regression. Model Error Term Assumptions Example 1: Locating a motor inn Goodness of Fit (R-square) Validity of estimates (t-stats & F-stats) Interpreting the regression coefficients & R-Square. Introduction.
E N D
Multiple Regression • Model • Error Term Assumptions • Example 1: Locating a motor inn • Goodness of Fit (R-square) • Validity of estimates (t-stats & F-stats) • Interpreting the regression coefficients & R-Square
Introduction • In this model we extend the simple linear regression model, and allow for any number of independent variables. • We will also learn to detect econometric problems.
Dependent variable Independent variables Model and Required Conditions • We allow for k independent variables to potentially be related to the dependent variable y = b0 + b1x1+ b2x2 + …+ bkxk + e Coefficients Random error variable
The simple linear regression model allows for one independent variable, “x” y =b0 + b1x + e y y = b0 + b1x y = b0 + b1x y = b0 + b1x y = b0 + b1x Note how the straight line becomes a plain, and... y = b0 + b1x1 + b2x2 y = b0 + b1x1 + b2x2 y = b0 + b1x1 + b2x2 y = b0 + b1x1 + b2x2 y = b0 + b1x1 + b2x2 X y = b0 + b1x1 + b2x2 1 y = b0 + b1x1 + b2x2 The multiple linear regression model allows for more than one independent variable. Y = b0 + b1x1 + b2x2 + e X2
Required conditions for the error variable e • The error e is normally distributed with mean equal to zero and a constant standard deviation se(independent of the value of y). se is unknown. • The errors are independent. • These conditions are required in order to • estimate the model coefficients, • assess the resulting model.
Example 1 Where to locate a new motor inn? • La Quinta Motor Inns is planning an expansion. • Management wishes to predict which sites are likely to be profitable. • Several areas where predictors of profitability can be identified are: • Competition • Market awareness • Demand generators • Demographics • Physical quality
Physical Margin Profitability Competition Market awareness Customers Community Rooms Nearest Office space College enrollment Income Disttwn Median household income. Number of hotels/motels rooms within 3 miles from the site. Distance to the nearest La Quinta inn. Distance to downtown.
Data was collected from randomly selected 100 inns that belong to La Quinta, and ran for the following suggested model: Margin =b0 + b1Rooms + b2Nearest + b3Office + b4College + b5Income + b6Disttwn +
This is the sample regression equation (sometimes called the prediction equation) MARGIN = 72.455 - 0.008ROOMS-1.646NEAREST + 0.02OFFICE +0.212COLLEGE - 0.413INCOME + 0.225DISTTWN • Excel output Let us assess this equation
H0: bi = 0 H1: bi = 0 Test statistic • Testing the coefficients • The hypothesis for each bi • Excel printout d.f. = n - k -1
Standard error of estimate • We need to estimate the standard error of estimate • Compare seto the mean value of y • From the printout, Standard Error = 5.5121 • Calculating the mean value of y we have • It seems se is not particularly small. • Can we conclude the model does not fit the data well?
Coefficient of determination • The definition is • From the printout, R2 = 0.5251 • 52.51% of the variation in the measure of profitability is explained by the linear regression model formulated above. • When adjusted for degrees of freedom, Adjusted R2 = 1-[SSE/(n-k-1)] / [SS(Total)/(n-1)] = = 49.44%
Testing the validity of the model • We pose the question: Is there at least one independent variable linearly related to the dependent variable? • To answer the question we test the hypothesis H0: b1 = b2 = … = bk = 0 H1: At least one bi is not equal to zero. • If at least one bi is not equal to zero, the model is valid.
MSR = F MSE • To test these hypotheses we perform an analysis of variance procedure. • The F test • Construct the F statistic • Rejection region F>Fa,k,n-k-1 MSR=SSR/k [Variation in y] = SSR + SSE. Large F results from a large SSR. Then, much of the variation in y is explained by the regression model. The null hypothesis should be rejected; thus, the model is valid. MSE=SSE/(n-k-1) Required conditions must be satisfied.
y Two data points (x1,y1) and (x2,y2) of a certain sample are shown. y2 y1 x1 x2 Total variation in y = Variation explained by the regression line) + Unexplained variation (error)
Example 1 - continued • Excel provides the following ANOVA results MSR/MSE MSE SSE MSR SSR
Example 1 - continued • Excel provides the following ANOVA results Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At least one of the bi is not equal to zero. Thus, at least one independent variable is linearly related to y. This linear regression model is valid Fa,k,n-k-1 = F0.05,6,100-6-1=2.17 F = 17.14 > 2.17 Also, the p-value (Significance F) = 3.03382(10)-13 Clearly, a = 0.05>3.03382(10)-13, and the null hypothesis is rejected.
Let us interpret the coefficients • This is the intercept, the value of y when all the variables take the value zero. Since the data range of all the independent variables do not cover the value zero, do not interpret the intercept. • In this model, for each additional 1000 rooms within 3 mile of the La Quinta inn, the operating margin decreases on the average by 7.6% (assuming the other variables are held constant).
In this model, for each additional mile that the nearest competitor is to La Quinta inn, the average operating margin decreases by 1.65% • For each additional 1000 sq-ft of office space, the average increase in operating margin will be .02%. • For additional thousand students MARGIN increases by .21%. • For additional $1000 increase in median household income, MARGIN decreases by .41% • For each additional mile to the downtown center, MARGIN increases by .23% on the average