1 / 31

Welcome to BUAD 310

Welcome to BUAD 310. Instructor: Kam Hamidieh Lecture 21, Wednesday April 9 , 2014. Agenda & Announcement. Today: Finish up the problem from last time & finish off Simple Linear Regression Start Multiple Regression, Chapter 23 Homework 6 is due today at 5 PM. . About Exam II.

ciqala
Download Presentation

Welcome to BUAD 310

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome to BUAD 310 Instructor: Kam Hamidieh Lecture 21, Wednesday April 9, 2014

  2. Agenda & Announcement • Today: • Finish up the problem from last time & finish off Simple Linear Regression • Start Multiple Regression, Chapter 23 • Homework 6 is due today at 5 PM. BUAD 310 - Kam Hamidieh

  3. About Exam II • NO CELL PHONES ARE ALLOWED. • Two cheat sheets allowed, both sides, hand written. • In class this Wednesday April 16. • Coversheet will be posted by Monday, 33 questions • Print z, and t tables and bring them with you. • Coverage: Lecture 12, March 3 to the end of lecture 21 (minus multiple regression), April 9, and HW 4, 5, & 6 • All Exam II relevant material will be posted by tomorrow morning. • Scantrons passed out Monday, fill out before the exam, do not bend it! • We will review all of Monday. • Extended office hours: • Monday April 14: 4-6 PM • Tuesday April 15: 2-6 PM BUAD 310 - Kam Hamidieh

  4. CI and Tests for B1 To test H0: B1 = 0 vs. Ha: B1 ≠ 0: (1) 100(1-α)% confidence interval for B1 is: b1 ± tα/2se(b1) where tα/2comes from a t distribution with df = n-2. Or (2) Compute the test statistics: then get the p-value from a t distribution with df = n-2. BUAD 310 - Kam Hamidieh

  5. CI for Mean Response 100(1-α)% confidence interval for at xnew is: ) where tα/2comes from a t distribution with df = n-2, and We will generally use software. BUAD 310 - Kam Hamidieh

  6. Outliers • “Outliers are observations that stand away from the rest of the data and appear distinct in a plot.” Imprecise! • They can have very strong influence in your final results. BUAD 310 - Kam Hamidieh

  7. Outliers r2 = 0.80, Se = 3.28 X = 1,2,…,20 BUAD 310 - Kam Hamidieh

  8. r2 = 0.80, Se = 3.28 r2 = 0.25, Se = 10 r2 = 0.29, Se = 9.7 r2 = 0.92, Se = 3.2 r2 = 026, Se = 6.1 BUAD 310 - Kam Hamidieh

  9. How to Deal with Outliers • There are NO hard and fast rules on how to deal with outliers except: you should not just throw out yours without SOLID justification. • Check for data entry errors. (Not always possible!) • Examine the physical context. • Report your results with and without outliers. • Standardized residuals can help identify outliers too. • Transformations can help. (This will be discussed when we cover multiple regression.) BUAD 310 - Kam Hamidieh

  10. Multiple Regression • Simple Linear Regression: • One Y and one X, fit a line that gives the mean of Y’s for a given X • Multiple regression: • One Y and multiple X’s, you have multiple predictors BUAD 310 - Kam Hamidieh

  11. Multiple Regression Model The observed response Y is linearly related to k explanatory variables X1, X2, …, and XK by the equation: A single Value of response comes from…. a linear combination of k variables plus… Error, Where… Error are normal iid Given a fixed values of X’s, the mean of Y’s is equal to …. a linear combination of X’s at those fixed values BUAD 310 - Kam Hamidieh

  12. Assumption (Redundant Slide?) • Constant Variance AssumptionThe variance of the error terms is σε2 the same for every combination of values of x1, x2,…, xk • Normality AssumptionThe error terms follow a normal distribution for every combination of values of x1, x2,…, xk • Independence AssumptionThe values of the error terms are statistically independent of each other BUAD 310 - Kam Hamidieh

  13. Simple versus Multiple Simple regression Data: (x1,y1) (x2,y2) … (xn,yn) Assumed Model: yi = B0 + B1 xi + εi εi ~ iid N(0,σε) Parameters: B0, B1, σε Multiple regression Data: (y1, x11,x12,…,x1k) (y2, x21,x22,…,x2k) … (yn, xn1,xn2,…,xnk) Assumed Model: yi = B0 + B1 xi 1 + B2xi 2 + … + Bkxi k εi εi ~ iid N(0,σε) Parameters: B0, B1, B2, … , Bk, σε BUAD 310 - Kam Hamidieh

  14. Example (Page 615) • Defaults from subprime housing market brought down several financial institutions in 2008 (Lehman, Bear Stern, and AIG) and led to a massive bailout of the financial system. • Goal: A bank regulator wants to know how lenders are using credit scores to determine the rate of interest paid by subprime borrows. • The variables of interest are: Y = APR, annual % rate on the loan X1 = LTV, loan to value ratio, how much of the loan covers the value of the property. Values near 0 are “good”, near 1 are “bad”. X2= Credit Score. The higher the better. X3 = Income in 1000’s of dollars X4 = Home value in 1000’s of dollars • The data are n = 372 mortgages obtained from a credit bureau. • There are 4 predictors: k = 4. BUAD 310 - Kam Hamidieh

  15. Example Variable Names X73 X72 X74 X71 Y7 A row is one observation BUAD 310 - Kam Hamidieh

  16. “Pairs Plot” BUAD 310 - Kam Hamidieh

  17. “Pairs Plot” APR seems linearly dependent on LTV and Credit Score and not so much on the other two. Looking at the relationship between predictors is a good idea too. BUAD 310 - Kam Hamidieh

  18. Pairwise Correlations BUAD 310 - Kam Hamidieh

  19. Pairwise Correlations Highest correlations are APR with LTV and Credit score. Why are some of the boxes empty? BUAD 310 - Kam Hamidieh

  20. Least Squares The values for B0, B1, …, BK are estimated via least squares method: Pick b0, b1,…, bkso this is as small as possible. But where is the line? BUAD 310 - Kam Hamidieh

  21. Least Squares Method One Response Y, two predictors X1 & X2. Method of least squares minimizes the vertical distances between the points and a plane. (Picture from An Introduction to Statistical Learbing with Applications in R by James, Witten, Hastie, Tibshirani) BUAD 310 - Kam Hamidieh

  22. Higher Dimensions? Ask him! He may know!!! BUAD 310 - Kam Hamidieh

  23. b0 ≈ 23.73 b1≈ -1.59 b2≈ -0.018 b3≈ 0.0004 b4≈ -0.00075 BUAD 310 - Kam Hamidieh

  24. Example Continued The estimated regression model now is: Note: y-hat gives the mean APR for a given set of predictor values. APR = 23.73 - 1.59(LTV) - 0.018(CreditScore) + + 0.0004(StatedIncome) - 0.00075(HomeValue) BUAD 310 - Kam Hamidieh

  25. Interpretation APR = 23.73 - 1.59(LTV) - 0.018(CreditScore) + + 0.0004(StatedIncome) - 0.00075(HomeValue) b0 = 23.73: When LTV = Credit Score = State Income = Home Value = 0, then the mean APR = 23.73% b1= -1.59: Holding all other x variables fixed, when LTV goes up by 0.1, then on average APR goes down by 0.159% (1.59 × 0.1) b1 = -0.018: Holding all other x variables fixed, when Credit Score goes up by 1 unit, then on average APR goes down by 0.018% etc……. BUAD 310 - Kam Hamidieh

  26. Example Suppose we observe a subprime borrower with the following characteristics:LTV = 0.90 Credit Score = 650 Stated Income = $45,000 Home Value = $400,000 Our estimated model says that on average such a customer gets: APR = 23.73 - 1.59(0.90) - 0.018(650) + 0.0004(45) -0.00075(400) APR ≈ 10.32% BUAD 310 - Kam Hamidieh

  27. In Class Exercise 1 Part (1): Refer to slide 15. • What are the predictor and response values for the 9th observation? • What are the values of y10, x24, x11,3? Part (2) Refer to slide 25. • Interpret the slope term for stated income variable. • What is the estimated mean APR for customer with LTV = 0.50, Credit Score = 600, Stated Income = $10,000, Home Value = $200,000? BUAD 310 - Kam Hamidieh

  28. Model Residuals • Residuals are defined just like the simple linear regression case: residual = observed – fitted. • The official formula: • What is the “picture” for residuals? BUAD 310 - Kam Hamidieh

  29. Standard Deviation of Residuals • Compute the standard deviation of the residuals: • It has the same interpretation as before: it tells how far away your observed points are from the “plane” on average. • Se estimates σε. • The value n – k – 1 is called the residual degrees of freedom. • SSE = Sums of Squared (due to) Error • MSE = Mean squared (due to) Error BUAD 310 - Kam Hamidieh

  30. Summarizing Results in a Table n – k – 1 = 372 – 4 – 1 = 367 MSE = 1.55 SSE = 567.80 Se = 1.24 BUAD 310 - Kam Hamidieh

  31. In Class Exercise 2 Again, refer to the subprime example. • What is the residual for the 9th observation? • What are the units of Se? • Referring to question 1, how many standard deviations does this observed value fall below or above the estimated equation? (This is relative to Se.) BUAD 310 - Kam Hamidieh

More Related