1 / 30

Welcome to BUAD 310

Welcome to BUAD 310. Instructor: Kam Hamidieh Lecture 23, Monday April 21, 2014. Agenda & Announcement. Today: Continue with Multiple Regression Talk about the Case Study due on Wednesday April 30 th . Pass back the exams & talk about the exam (time permitting)

gyan
Download Presentation

Welcome to BUAD 310

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome to BUAD 310 Instructor: Kam Hamidieh Lecture 23, Monday April 21, 2014

  2. Agenda & Announcement • Today: • Continue with Multiple Regression • Talk about the Case Study due on Wednesday April 30th. • Pass back the exams & talk about the exam (time permitting) • Homework 7 will be posted soon. It is due Friday May 2, 5 PM. • Reading: • Read all of 23 carefully but you can skip the path diagram stuff. • Read all of 24, but you can lightly read the topic of VIF (Variance Inflation Factor) BUAD 310 - Kam Hamidieh

  3. Some Important Dates • Case Study due on Wednesday April 30 • Homework 7 due on Friday, May 2, 2014 • Final Exam on Thursday May 8th, 11 AM – 1:00 PM, in room THH 101. See http://web-app.usc.edu/maps/(I recommend you scope out the location before the exam.) BUAD 310 - Kam Hamidieh

  4. Some Fun Stuff http://blogs.wsj.com/atwork/2014/04/15/best-jobs-of-2014-congratulations-mathematicians/?mod=e2fb(Jake S. and William C.) http://fivethirtyeight.com/features/the-toolsiest-player-of-them-all/(Joshua C.) BUAD 310 - Kam Hamidieh

  5. Multiple Regression Model • The observed response Y is linearly related to k explanatory variables X1, X2, …, and XK by the equation: • The values for B0, B1, …, BK are estimated via least squares method; Pick b0, b1 ,…, bkso the quantity below is as small as possible: BUAD 310 - Kam Hamidieh

  6. Model Residuals • Residuals are defined just like the simple linear regression case: residual = observed – fitted. • The official formula: BUAD 310 - Kam Hamidieh

  7. Previous Example b0 ≈ 23.73 b1≈ -1.59 b2≈ -0.018 b3≈ 0.0004 b4≈ -0.00075 n – k – 1 = 372 – 4 – 1 = 367 (k = # of predictors) MSE = 1.55 Se = 1.24 (estimate of σɛ ) SSE = 567.80 APR = 23.73 - 1.59(LTV) - 0.018(CreditScore) + + 0.0004(StatedIncome) - 0.00075(HomeValue) BUAD 310 - Kam Hamidieh

  8. Solution to In Class Exercise 1 from Lecture 21 Part (1) (1) Predictor Y = 10.07, Response: LTV = 0.942, Credit Score = 640, Stated Income = 100000, Home Value = 305000 (2) Y10 = 12.87, X24 = 450000, X11,3 = 70000 Part (2) When stated income goes up by $1000, while holding all other predictors fixed, on average APR goes up by 0.0004%. APR = 23.73 – 1.59(1/2) – 0.018(600) + 0.0004(10) – (0.00075)(200) ≈ 12% BUAD 310 - Kam Hamidieh

  9. Solution to In Class Exercise 2 from Lecture 21 (1) Y observed = 10.7 Fitted Y (APR) = 23.73 – 1.59(0.942) – 0.018(640) + 0.0004(100) – (0.00075)(305) = 10.52 Residual = 10.07 – 10.52 = -0.45 (2) Same as APR’s units so in % (3) -0.45/1.242 ≈ -0.36 standard deviation united below the estimated equation BUAD 310 - Kam Hamidieh

  10. Partition of the Total Variability • Y values have variability. • One way to measure this variability is to see how your Y values vary from your overall mean of Y’s. • It can be shown – not at all obvious! – that: The regression or AKA model + … Total variation in Y’s is accounted for by …. Leftovers or residuals or “errors” BUAD 310 - Kam Hamidieh

  11. Partition of the Total Variability SSE SSR SST • SST = Sum of Squares Total, Total variation in Y values • SSR = Sum of Squares Regression, Variation account for by the regression (SSM is used too!) • SSE = Sum of Squares Error, Left over variation BUAD 310 - Kam Hamidieh

  12. Summarizing Results in a Table MSR = Mean squared (due to) Regression = SSR/k SSR SST BUAD 310 - Kam Hamidieh

  13. (Multiple) Coefficient of Determination • The coefficient of determination R2 is defined as: • Its value tells us the percentage of variation in your response value accounted for (or explained by) the regression onto your predictor values. • What is the difference between r2 from simple linear regression and R2 from multiple regression? BUAD 310 - Kam Hamidieh

  14. Summarizing Results in a Table About 46% of the variation in the APR values are accounted for (or explained by) the regression onto the predictor variables LTV,…, Home Value. BUAD 310 - Kam Hamidieh

  15. Issues! • It can be shown that adding more variables to the model will always inflate R2. (See page 621 of your book for an intuitive discussion.) • Remedy: use adjusted R2: • The adjusted R2 now compensates for this issue. HOW/WHY? • The adjusted R2also makes it easier to compare models. (More on this later.) • However, the “% variation accounted for” interpretation does not apply for the adjusted R2. BUAD 310 - Kam Hamidieh

  16. Adjusted R Squared Here it is! Verify the formula! BUAD 310 - Kam Hamidieh

  17. The F-Test • If the multiple regression seems reasonable, one of the first “tests” you usually carry out is the “F-Test”:H0: B1 = B2 = … = Bk = 0Ha: At least one of Bi’s ≠ 0 • Informally, null says “the predictors are useless” vs. alternative model “at one of the predictors is useful.” BUAD 310 - Kam Hamidieh

  18. Regression ANOVA Table Here it is F statistics & its p-value. Since: P-Value < 0.05 We see that at least one of the predictors is significant. ANOVA Table BUAD 310 - Kam Hamidieh

  19. Many Thanks to… One of the “giants” of statistics. Many things are named after him. From Wiki:Anders Hald called him "a genius who almost single-handedly created the foundations for modern statistical science” while Richard Dawkins named him "the greatest biologist since Darwin". Ronald Fisher BUAD 310 - Kam Hamidieh

  20. In Class Exercise 1 • This will be handed out in class. BUAD 310 - Kam Hamidieh

  21. Looking at Individual Coefficients • We want to determine the statistical significance of a single predictor in the model. Why? • We want to test for jth predictor:H0: Bj = 0Ha: Bj ≠ 0 • We have two options: • Get a p-values • Get a confidence interval for Bj BUAD 310 - Kam Hamidieh

  22. Looking at Individual Coefficients For testing H0: Bj = 0 versus Ha: Bj ≠ 0 • Use the output, to get the test statistics and now compute p-value by looking at t-distribution with df = n – k – 1, and compare with your α • Create a 100(1-α)% CI: where tα/2 comes from a t-distribution with df = n – k - 1 BUAD 310 - Kam Hamidieh

  23. Our Example, P-Values se(b1), t-statistics, and p-value for LTV variable se(b2), t-statistics, and p-value for CreditScore variable se(b3), t-statistics, and p-value for StatedIncome variable se(b4), t-statistics, and p-value for HomeValue variable How about 95% confidence intervals? BUAD 310 - Kam Hamidieh

  24. Looking at Individual Coefficients • Looking at the previous slide, we see that LTV and CreditScore are statistically significant predictors. • Should we throw away the non-significant predictors? • Important: The tests for the individual regression coefficients (or predictors) assess the statistical significance of each predictor variable assuming that all other predictors are included in the regression. • It’s possible that you throw away a non-significant predictor, and your results for other predictors change! BUAD 310 - Kam Hamidieh

  25. Variable Selection • Variable selection is intended to select the “best” subset of predictors. • Motivation: • We want to select the simplest model that gets the job done. • We can avoid “multicollinearity”. More on this later. • Practical matters! Like what? • Can we simplify our subprime model? BUAD 310 - Kam Hamidieh

  26. Variable Selection Methods • Entire books are written on variable selection! • Here’s the simplest method, called the backward elimination: • Start with the largest model (has all the predictors) • Remove the predictor with the largest p-value greater than αcrit. This is usually around 0.10 to 0.20. (Why not 0.05?) • Stop when all non-significant predictors have been removed. • What happens in our example? BUAD 310 - Kam Hamidieh

  27. Backward Elimination StatedIncome & HomeValue are removed. BUAD 310 - Kam Hamidieh

  28. Full Model (Left) vs. New Model (Right) APR = 23.73 - 1.59(LTV) - 0.018(CreditScore) + + 0.0004(StatedIncome) - 0.00075(HomeValue) APR = 23.69 - 1.58(LTV) - 0.019(CreditScore)) In Summary: The remaining coefficients in the new model do not change much. Se and R2 go down only slightly. BUAD 310 - Kam Hamidieh

  29. Other Variable Selection • Forward selection: add in variables with the lowest p-value first (opposite of backward) • Criterion based: pick the model with the best “criterion” such as adjusted R squared. • All subsets!!! Try out every single combination and pick the model with the best “criterion”. You can use adjusted R squared as an example. • The cutting edge seems to be LASSO = Least Absolute Shrinkage and Selection Operator (Take more stats) BUAD 310 - Kam Hamidieh

  30. In Class Exercise 2 This is just the continuation of in class exercise 1. BUAD 310 - Kam Hamidieh

More Related