250 likes | 671 Views
Lecture 11: Multiple Regression. February 19 th , 2014. Question. Assuming there is no curve on the exam, how do you think you scored? (it was out of 100 excluding the bonus) 90 or higher 80-89 70-79 60-69 59 or lower. Administrative. Exam 1 Results soon… Homework 5 Due Monday
E N D
Lecture 11:Multiple Regression February 19th, 2014
Question Assuming there is no curve on the exam, how do you think you scored? (it was out of 100 excluding the bonus) • 90 or higher • 80-89 • 70-79 • 60-69 • 59 or lower
Administrative • Exam 1 • Results soon… • Homework 5 • Due Monday • Homework 4 – not so good. Lots of mistakes. • Don’t write "raw" answers, i.e. out of context, with no measurement units or relevant vocabulary used. • Quiz 3 – next Wednesday. • Multiple Regression
Multiple Regression Model • Direct and Indirect effects: • Income has both a positive direct effect on sales but an indirect negative effect, via the number of competitors • Collinearity: correlation between the explanatory variables • High collinearity is a problem (we’ll talk about later). • It could cause an issue in interpreting your results. • Think about the “controlling for” aspect of interpreting the slopes. • In our example: income and competition are collinear
Multiple Regression Model • Compare the Simple Regression models to the Multivariate model: • Predict Sales by Median Household Income (in thousands of $) • r2 =0.501 • se = 74.87 • Predict Sales by Number of Competing stores in the same mall. • r2 = 0.004 • se = 105.79 • Predict Sales by both Income and # of Competitors: • Why the change in Competitors?
Question The partial slope for an explanatory variable has to be smaller in absolute value than its marginal slope • True. • False. False: it might be smaller but doesn’t have to be; it depends on the size and sign of any indirect effects.
Inference • Why the hypothesis testing of the intercept and slopes? • Recall that bi is our estimate of βi • This estimate is distributed normally. So even while we’ll never see βi we’ll get an estimate of it. So we’re often interested in seeing whether there is actually an effect (or some predictive power) of the X variable on the Y. So essentially what we’re doing is a hypothesis test
Be able to read regression output Call: lm(formula = growth ~ rev_coups + tradeshare + yearsschool, data = growth) Residuals: Min 1Q Median 3Q Max -3.5887 -0.9873 -0.0719 0.8364 5.7051 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.0001517 0.6929193 ______ 0.9998 rev_coups -0.9637983 1.0245684 ______ 0.3506 tradeshare 2.1642518 0.7501064 ______ 0.0054 ** yearsschool 0.2213497 0.0883593 ______ 0.0149 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.686 on 61 degrees of freedom Multiple R-squared: 0.2468, Adjusted R-squared: 0.2098 F-statistic: 6.664 on 3 and 61 DF, p-value: 0.0005768 What is the t-statistic for the estimate of tradeshare?
Be able to read regression output Call: lm(formula = growth ~ rev_coups + tradeshare + yearsschool, data = growth) Residuals: Min 1Q Median 3Q Max -3.5887 -0.9873 -0.0719 0.8364 5.7051 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.0001517 0.6929193 ______ 0.9998 rev_coups -0.9637983 1.0245684 ______ 0.3506 tradeshare 2.1642518 0.7501064 ______ 0.0054 ** yearsschool 0.2213497 0.0883593 ______ 0.0149 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.686 on 61 degrees of freedom Multiple R-squared: 0.2468, Adjusted R-squared: 0.2098 F-statistic: 6.664 on 3 and 61 DF, p-value: 0.0005768 What is the t-statistic for the estimate of tradeshare? 2.885
Inference in MRM F-test • Tests the explanatory power of the multivariate model as a whole • Conceptually = a test of the size of the R2 • F-statistic = ratio of the sample variance of the fitted values to the variance of the residuals. • Why do we not just look at the βs? • With lots of covariates, some will look significant by chance. • Still look at the t-statistics/p-values to test the βs
Question The partial slope in a multiple regression corresponds to the direct effect in a path diagram • True • False True.
Multiple Regression Example Example: Subprime mortgages. Subprime.csv A banking regulator would like to verify how lenders use credit scores to determine the interest rate paid by subprime borrowers. The regulator would like to separate its effect from other variables such as loan-to-value (LTV) ratio, income of the borrower and value of the home. • Use multiple regression on data obtained for 372 mortgages from a credit bureau. Predict the annual percentage rate of interest on the loan (APR) by the LTV, credit score, income of the borrower, and home value.
Multiple Regression Example Example: Subprime mortgages. Subprime.csv MRM checklist: • Look at the data (scatterplot / correlation matrix) • Transform any variables? • Fit the regression model • Examine residuals and fitted values from the regression • Look at calibration plot (residual by fitted, or the y-hats. • Look at residual plots by the various explanatory variables • Are the residuals normally distributed? • Examine the F-statistic • Test / interpret the partial slopes.
Multiple Regression Example Example: Subprime mortgages. Subprime.csv MRM checklist: • Look at the data (scatterplot / correlation matrix) Construct a correlation matrix. What is the correlation between LTV and APR? • -0.3512 • -0.4265 • 0.2513 • No idea
Multiple Regression Example Example: Subprime mortgages. Subprime.csv MRM checklist: • Look at the data (scatterplot / correlation matrix) • Fairly linear and no obvious variables to include • Transform any variables? • None really needed. Proceed.
Multiple Regression Example Example: Subprime mortgages. Subprime.csv MRM checklist: • Fit the regression model The partial slope on Credit Score is? • - 0.0184 • - 0.1936 • 0.0635 • - 1.5888
Multiple Regression Example Example: Subprime mortgages. Subprime.csv MRM checklist: • Examine the residuals. Is the variance of the residuals similar across APR estimates? • Yes • No • I have no idea.
Multiple Regression Example Example: Subprime mortgages. Subprime.csv MRM checklist: • Examine the residuals. Are the residuals distributed normally? • Yes • No • I have no idea.
Multiple Regression Example Example: Subprime mortgages. Subprime.csv MRM checklist: • F-statistic / F-Ratio • Test / interpret the partial slopes. • Know the range and units of the variables!
Question If we reject H0: β1 = β2= 0 using the F-test, then we should conclude that both slopes are different from zero. • True • False False: we should only conclude that there is some deviation from the hypothesis. Both could be different than zero, but it could also be the case that one of them is different than zero.
Next time • Collinearity • Model building. • After Spring Break: categorical variables.