1 / 22

Welcome to BUAD 310

Welcome to BUAD 310. Instructor: Kam Hamidieh Lecture 25, Monday April 28, 2014. Agenda & Announcement. Today: Go over exam 2 briefly Continue with Multiple Regression Extra office hours this week: Thursday May 1, 3-5 PM. Reminder: Case Study due on Wednesday April 30 by 5 PM.

luz
Download Presentation

Welcome to BUAD 310

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome to BUAD 310 Instructor: Kam Hamidieh Lecture 25, Monday April 28, 2014

  2. Agenda & Announcement Today: • Go over exam 2 briefly • Continue with Multiple Regression • Extra office hours this week: Thursday May 1, 3-5 PM. Reminder: • Case Study due on Wednesday April 30 by 5 PM. • Homework 7, now posted, is due on Friday, May 2, 2014 • Final Exam on Thursday May 8th, 11 AM – 1:00 PM, in room THH 101. See http://web-app.usc.edu/maps/ BUAD 310 - Kam Hamidieh

  3. From Last Time Checking Assumptions: • Y is linear in each predictor:- Look at the plot of Y against each X. - Plot the residuals versus each x and fitted values • Constant Variance Assumption- Plot the residuals versus each x and fitted values • Normality Assumption- Look at the histogram & Q-Q plot of the residuals. • Independence AssumptionIf residuals have time or spatial dependency, you can just plot them in order. Also: Watch for unusual points. BUAD 310 - Kam Hamidieh

  4. Transformations • Transformation: re-expression of a variable by applying a function to each observation. • Transformations allow the use of regression analysis to describe a curved patternand improve your model (better residual analysis.) • You can transform Y or X or both but the interpretations becomes difficult. • Looking at the plots of Y vs X and the distributions of your variables can help you pick the right transformations. • A nonlinear transformation useful in business applications: logarithms. BUAD 310 - Kam Hamidieh

  5. Collinearity - Example The file “testdata.txt” on our website contains the data shown on the right. The data were synthetically generated from: Y = 25 – 5 X1 + N(0,sd=10) X1 = 1,2,…,20 X2 is the same as X1 expect the last point. BUAD 310 - Kam Hamidieh

  6. Pairs Plot • Comments? • Do you think there is a linear relationship between Y and X1? • How about Y and X2? • Will the R2 be small or large? • Will the p-value for F-Stat be small or large? • Do you each of the predictors X1 and X2 be statistically significant? BUAD 310 - Kam Hamidieh

  7. Results of Simple Linear Regressions The results of the simple linear regression are “good”. BUAD 310 - Kam Hamidieh

  8. Multiple Regression Results What ?!?!?! Do the results make sense? BUAD 310 - Kam Hamidieh

  9. Multicollinearity • The problem of multicollinearity: if you have two or more predictor variables that are highly correlated with each other then it can make all the regression results very unreliable. • How to detect it: • Examine the pairs plot and correlation matrix; If you see correlations of 0.9 or higher, you should suspect multicollinearity. • High F stat value but non of the predictors are significant. • Standard errors seem very large. • More quantitative approach: compute the variance inflation factor or VIF. • Others… BUAD 310 - Kam Hamidieh

  10. Variance Inflation Factor • Suppose you have multiple regression model with k predictors. • The variance inflation factor (VIF) for Xj, j = 1,2,..k, is defined as:where Rj2 is the R2 in the regression of Xj on all of the other predictor variables. (No Y involved.) • Why is this a good idea? BUAD 310 - Kam Hamidieh

  11. More on VIF • It can be shown that: • As Rj2gets close to 1, se(bj) gets bigger and bigger…gets inflated! • Don’t need to put this formula on your cheat sheet. • …hence the name variance inflation factor. BUAD 310 - Kam Hamidieh

  12. Guidelines on VIF • What is the range of values for VIF? • What does a VIF of near 1 mean? • The cut offs of VIF = 5 or 10 is most often used to identify danger. BUAD 310 - Kam Hamidieh

  13. Our Example R2 = 0.9988 What is the VIF here? VIF = 1/(1 – 0.9988) ≈ 833! This is extremely large!!! BUAD 310 - Kam Hamidieh

  14. What to do then? Here are some solutions: • Amputation! Remove the redundant variables. Variable selection methods can help a lot. • Re-express the predictors. For example: if it makes sense, you can create a new predictor by average two highly predictors. Example? BUAD 310 - Kam Hamidieh

  15. From Our Real Estate Example VIF for LTV:1/(1-0.2369) ≈ 1.3 VIF for HomeValue:1/(1-0.0558) ≈ 1.1 VIF for StatedIncome:1/(1-0.0451) ≈ 1.0 VIF for CreditScore:1/(1-0.2446) ≈ 1.3 Multicollinearity is not a problem here. BUAD 310 - Kam Hamidieh

  16. In Class Exercise 1 • Comment on the following statements. Do you agree or disagree and why? • The presence of multicollinearity violates an assumption of the multiple regression model. • In order to calculate the VIF for a predictor, we need to use the values of the response. • An analyst would like to build a regression model to predict Y from X1, X2, X3, and X4. She looks at the correlation matrix below: • Do you see a pair of variables that could potentially cause aproblem in her regression? Why? • What is the VIF for X2? BUAD 310 - Kam Hamidieh

  17. Confidence and Prediction Intervals • The fitted value of the response corresponding to a particular combination of values of the independent variables X1,…, Xk is • We use this value as an estimate for the mean (or a future value) of y when X1=x1,…, Xk=xk, but our estimate will not be exactly right • Therefore, we need to place bounds on how far this guess might be from the truth • We can do this by calculating a confidence interval mean for the value of y and a prediction interval for an individual value of y BUAD 310 - Kam Hamidieh

  18. Which to Choose? • Use the prediction interval (PI) when you want to predict an individual value of the response variable. • Use the confidence interval (CI) when you want to estimate the mean value of the response. • Note: the prediction interval will always be wider than the confidence interval (given the same values of the explanatory variables) BUAD 310 - Kam Hamidieh

  19. Example: Women’s Clothing Stores Our variables of interest: Y = sales at stores in a chain of women’s apparel (annually in dollars per square foot of retail space) X1 = median household income in the area (thousands of dollars) X2 = number of competing apparel stores in the same mall. Goal: Predict sales at the stores of this chain Data: “23_mall_sales.txt”, n = 65 BUAD 310 - Kam Hamidieh

  20. Regression Results Estimated Mean Sales = 60 + 7.96 (Income) – 24.17 (Competitors), Se = 68.03 BUAD 310 - Kam Hamidieh

  21. Prediction Intervals • Suppose you want to create a prediction interval at a location with median income of $70,000 and 3 competitors near by. • You best point estimate for the mean sales and an individual value will be the same: • However the width of the intervals will be different. = 60 + 7.96 (70) – 24.17 (3) ≈ 545 $/(sqr foot) BUAD 310 - Kam Hamidieh

  22. Using Software to Get CI & PI • Your book gives an approximate(95%) formula for the prediction interval (see page 614 and you can use the result on this page for your project as well.): • However, in practice, let a reliable software do it. • Software gives: • 95% CI for mean response: (409, 683) • 95% PI: (527, 564) BUAD 310 - Kam Hamidieh

More Related