190 likes | 294 Views
Lecture 6 Notes. Note: I will e-mail homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis (Chapter 4.2)
E N D
Lecture 6 Notes • Note: I will e-mail homework 2 tonight. It will be due next Thursday. • The Multiple Linear Regression model (Chapter 4.1) • Inferences from multiple regression analysis (Chapter 4.2) • In multiple regression analysis, we consider more than one independent variable x1,…,xK . We are interested in the conditional mean of y given x1,…,xK .
Automobile Example • A team charged with designing a new automobile is concerned about the gas mileage that can be achieved. The design team is interested in two things: (1) Which characteristics of the design are likely to affect mileage? (2) A new car is planned to have the following characteristics: weight – 4000 lbs, horsepower – 200, cargo – 18 cubic feet, seating – 5 adults. Predict the new car’s gas mileage. • The team has available information about gallons per 1000 miles and four design characteristics (weight, horsepower, cargo, seating) for a sample of cars made in 1989. Data is in car89.JMP.
Best Single Predictor • To obtain the correlation matrix and pairwise scatterplots, click Analyze, Multivariate Methods, Multivariate. • If we use simple linear regression with each of the four independent variables, which provides the best predictions?
Best Single Predictor • Answer: The simple linear regression that has the highest R2 gives the best predictions because recall that • Weight gives the best predictions of GPM1000Hwy based on simple linear regression. • But we can obtain better predictions by using more than one of the independent variables.
Multiple Linear Regression Model • Assumptions about : • The expected value of the disturbances is zero for each , • The variance of each is equal to ,i.e., • The are normally distributed. • The are independent.
Point Estimates for Multiple Linear Regression Model • We use the same least squares procedure as for simple linear regression. • Our estimates of are the coefficients that minimize the sum of squared prediction errors: • Least Squares in JMP: Click Analyze, Fit Model, put dependent variable into Y and add independent variables to the construct model effects box.
Root Mean Square Error • Estimate of : • = Root Mean Square Error in JMP • For simple linear regression of GP1000MHWY on Weight, . For multiple linear regression of GP1000MHWY on weight, horsepower, cargo, seating,
Residuals and Root Mean Square Errors • Residual for observation i = prediction error for observation i = • Root mean square error = Typical size of absolute value of prediction error • As with simple linear regression model, if multiple linear regression model holds • About 95% of the observations will be within two RMSEs of their predicted value • For car data, about 95% of the time, the actual GP1000M will be within 2*3.54=7.08 GP1000M of the predicted GP1000M of the car based on the car’s weight, horsepower, cargo and seating.
Inferences about Regression Coefficients • Confidence intervals: confidence interval for : Degrees of freedom for t equals n-(K+1). Standard error of , , found on JMP output. • Hypothesis Test: Decision rule for test: Reject H0 if or where p-value for testing is printed in JMP output under Prob>|t|.
Inference Examples • Find a 95% confidence interval for ? • Is seating of any help in predicting gas mileage once horsepower, weight and cargo have been taken into account? Carry out a test at the 0.05 significance level.
Partial Slopes vs. Marginal Slopes • Multiple Linear Regression Model: • The coefficient is a partial slope. It indicates the change in the mean of y that is associated with a one unit increase in while holding all other variables fixed. • A marginal slope is obtained when we perform a simple regression with only one X, ignoring all other variables. Consequently the other variables are not held fixed.
Partial Slopes vs. Marginal Slopes Example • In order to evaluate the benefits of a proposed irrigation scheme in a certain region, suppose that the relation of yield Y to rainfall R is investigated over several years. • Data is in rainfall.JMP.
Rainfall is estimated to be beneficial once temperature is held fixed. Multiple regression provides a better picture of the benefits of an irrigation scheme because temperature would be held fixed in an irrigation scheme.