310 likes | 433 Views
Welcome to BUAD 310. Instructor: Kam Hamidieh Lecture 21, Wednesday April 9 , 2014. Agenda & Announcement. Today: Finish up the problem from last time & finish off Simple Linear Regression Start Multiple Regression, Chapter 23 Homework 6 is due today at 5 PM. . About Exam II.
E N D
Welcome to BUAD 310 Instructor: Kam Hamidieh Lecture 21, Wednesday April 9, 2014
Agenda & Announcement • Today: • Finish up the problem from last time & finish off Simple Linear Regression • Start Multiple Regression, Chapter 23 • Homework 6 is due today at 5 PM. BUAD 310 - Kam Hamidieh
About Exam II • NO CELL PHONES ARE ALLOWED. • Two cheat sheets allowed, both sides, hand written. • In class this Wednesday April 16. • Coversheet will be posted by Monday, 33 questions • Print z, and t tables and bring them with you. • Coverage: Lecture 12, March 3 to the end of lecture 21 (minus multiple regression), April 9, and HW 4, 5, & 6 • All Exam II relevant material will be posted by tomorrow morning. • Scantrons passed out Monday, fill out before the exam, do not bend it! • We will review all of Monday. • Extended office hours: • Monday April 14: 4-6 PM • Tuesday April 15: 2-6 PM BUAD 310 - Kam Hamidieh
CI and Tests for B1 To test H0: B1 = 0 vs. Ha: B1 ≠ 0: (1) 100(1-α)% confidence interval for B1 is: b1 ± tα/2se(b1) where tα/2comes from a t distribution with df = n-2. Or (2) Compute the test statistics: then get the p-value from a t distribution with df = n-2. BUAD 310 - Kam Hamidieh
CI for Mean Response 100(1-α)% confidence interval for at xnew is: ) where tα/2comes from a t distribution with df = n-2, and We will generally use software. BUAD 310 - Kam Hamidieh
Outliers • “Outliers are observations that stand away from the rest of the data and appear distinct in a plot.” Imprecise! • They can have very strong influence in your final results. BUAD 310 - Kam Hamidieh
Outliers r2 = 0.80, Se = 3.28 X = 1,2,…,20 BUAD 310 - Kam Hamidieh
r2 = 0.80, Se = 3.28 r2 = 0.25, Se = 10 r2 = 0.29, Se = 9.7 r2 = 0.92, Se = 3.2 r2 = 026, Se = 6.1 BUAD 310 - Kam Hamidieh
How to Deal with Outliers • There are NO hard and fast rules on how to deal with outliers except: you should not just throw out yours without SOLID justification. • Check for data entry errors. (Not always possible!) • Examine the physical context. • Report your results with and without outliers. • Standardized residuals can help identify outliers too. • Transformations can help. (This will be discussed when we cover multiple regression.) BUAD 310 - Kam Hamidieh
Multiple Regression • Simple Linear Regression: • One Y and one X, fit a line that gives the mean of Y’s for a given X • Multiple regression: • One Y and multiple X’s, you have multiple predictors BUAD 310 - Kam Hamidieh
Multiple Regression Model The observed response Y is linearly related to k explanatory variables X1, X2, …, and XK by the equation: A single Value of response comes from…. a linear combination of k variables plus… Error, Where… Error are normal iid Given a fixed values of X’s, the mean of Y’s is equal to …. a linear combination of X’s at those fixed values BUAD 310 - Kam Hamidieh
Assumption (Redundant Slide?) • Constant Variance AssumptionThe variance of the error terms is σε2 the same for every combination of values of x1, x2,…, xk • Normality AssumptionThe error terms follow a normal distribution for every combination of values of x1, x2,…, xk • Independence AssumptionThe values of the error terms are statistically independent of each other BUAD 310 - Kam Hamidieh
Simple versus Multiple Simple regression Data: (x1,y1) (x2,y2) … (xn,yn) Assumed Model: yi = B0 + B1 xi + εi εi ~ iid N(0,σε) Parameters: B0, B1, σε Multiple regression Data: (y1, x11,x12,…,x1k) (y2, x21,x22,…,x2k) … (yn, xn1,xn2,…,xnk) Assumed Model: yi = B0 + B1 xi 1 + B2xi 2 + … + Bkxi k εi εi ~ iid N(0,σε) Parameters: B0, B1, B2, … , Bk, σε BUAD 310 - Kam Hamidieh
Example (Page 615) • Defaults from subprime housing market brought down several financial institutions in 2008 (Lehman, Bear Stern, and AIG) and led to a massive bailout of the financial system. • Goal: A bank regulator wants to know how lenders are using credit scores to determine the rate of interest paid by subprime borrows. • The variables of interest are: Y = APR, annual % rate on the loan X1 = LTV, loan to value ratio, how much of the loan covers the value of the property. Values near 0 are “good”, near 1 are “bad”. X2= Credit Score. The higher the better. X3 = Income in 1000’s of dollars X4 = Home value in 1000’s of dollars • The data are n = 372 mortgages obtained from a credit bureau. • There are 4 predictors: k = 4. BUAD 310 - Kam Hamidieh
Example Variable Names X73 X72 X74 X71 Y7 A row is one observation BUAD 310 - Kam Hamidieh
“Pairs Plot” BUAD 310 - Kam Hamidieh
“Pairs Plot” APR seems linearly dependent on LTV and Credit Score and not so much on the other two. Looking at the relationship between predictors is a good idea too. BUAD 310 - Kam Hamidieh
Pairwise Correlations BUAD 310 - Kam Hamidieh
Pairwise Correlations Highest correlations are APR with LTV and Credit score. Why are some of the boxes empty? BUAD 310 - Kam Hamidieh
Least Squares The values for B0, B1, …, BK are estimated via least squares method: Pick b0, b1,…, bkso this is as small as possible. But where is the line? BUAD 310 - Kam Hamidieh
Least Squares Method One Response Y, two predictors X1 & X2. Method of least squares minimizes the vertical distances between the points and a plane. (Picture from An Introduction to Statistical Learbing with Applications in R by James, Witten, Hastie, Tibshirani) BUAD 310 - Kam Hamidieh
Higher Dimensions? Ask him! He may know!!! BUAD 310 - Kam Hamidieh
b0 ≈ 23.73 b1≈ -1.59 b2≈ -0.018 b3≈ 0.0004 b4≈ -0.00075 BUAD 310 - Kam Hamidieh
Example Continued The estimated regression model now is: Note: y-hat gives the mean APR for a given set of predictor values. APR = 23.73 - 1.59(LTV) - 0.018(CreditScore) + + 0.0004(StatedIncome) - 0.00075(HomeValue) BUAD 310 - Kam Hamidieh
Interpretation APR = 23.73 - 1.59(LTV) - 0.018(CreditScore) + + 0.0004(StatedIncome) - 0.00075(HomeValue) b0 = 23.73: When LTV = Credit Score = State Income = Home Value = 0, then the mean APR = 23.73% b1= -1.59: Holding all other x variables fixed, when LTV goes up by 0.1, then on average APR goes down by 0.159% (1.59 × 0.1) b1 = -0.018: Holding all other x variables fixed, when Credit Score goes up by 1 unit, then on average APR goes down by 0.018% etc……. BUAD 310 - Kam Hamidieh
Example Suppose we observe a subprime borrower with the following characteristics:LTV = 0.90 Credit Score = 650 Stated Income = $45,000 Home Value = $400,000 Our estimated model says that on average such a customer gets: APR = 23.73 - 1.59(0.90) - 0.018(650) + 0.0004(45) -0.00075(400) APR ≈ 10.32% BUAD 310 - Kam Hamidieh
In Class Exercise 1 Part (1): Refer to slide 15. • What are the predictor and response values for the 9th observation? • What are the values of y10, x24, x11,3? Part (2) Refer to slide 25. • Interpret the slope term for stated income variable. • What is the estimated mean APR for customer with LTV = 0.50, Credit Score = 600, Stated Income = $10,000, Home Value = $200,000? BUAD 310 - Kam Hamidieh
Model Residuals • Residuals are defined just like the simple linear regression case: residual = observed – fitted. • The official formula: • What is the “picture” for residuals? BUAD 310 - Kam Hamidieh
Standard Deviation of Residuals • Compute the standard deviation of the residuals: • It has the same interpretation as before: it tells how far away your observed points are from the “plane” on average. • Se estimates σε. • The value n – k – 1 is called the residual degrees of freedom. • SSE = Sums of Squared (due to) Error • MSE = Mean squared (due to) Error BUAD 310 - Kam Hamidieh
Summarizing Results in a Table n – k – 1 = 372 – 4 – 1 = 367 MSE = 1.55 SSE = 567.80 Se = 1.24 BUAD 310 - Kam Hamidieh
In Class Exercise 2 Again, refer to the subprime example. • What is the residual for the 9th observation? • What are the units of Se? • Referring to question 1, how many standard deviations does this observed value fall below or above the estimated equation? (This is relative to Se.) BUAD 310 - Kam Hamidieh