200 likes | 308 Views
Lecture 22 – Thurs., Nov. 25. Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter 10.1-10.2). Nominal Variables. To incorporate nominal variables in multiple regression analysis, we use indicator variables.
E N D
Lecture 22 – Thurs., Nov. 25 • Nominal explanatory variables (Chapter 9.3) • Inference for multiple regression (Chapter 10.1-10.2)
Nominal Variables • To incorporate nominal variables in multiple regression analysis, we use indicator variables. • Indicator variable to distinguish between two groups • The time onset (early vs. late) is a nominal variable. To incorporate it into multiple regression analysis, we used indicator variable early which equals 1 if early, 0 if late.
Nominal Variables with More than Two Categories • To incorporate nominal variables with more than two categories, we use multiple indicator variables. If there are k categories, we need k-1 indicator variables.
Nominal Explanatory Variables Example: Auction Car Prices • A car dealer wants to predict the auction price of a car. • The dealer believes that odometer reading and the car color are variables that affect a car’s price (data from sample of cars in auctionprice.JMP) • Three color categories are considered: • White • Silver • Other colors • Note: Color is a nominal variable.
Indicator Variables in Auction Car Prices 1 if the color is white 0 if the color is not white I1 = 1 if the color is silver 0 if the color is not silver I2 = The category “Other colors” is defined by: I1 = 0; I2 = 0
Auction Car Price Model • Solution • the proposed model is • The data White car Other color Silver color
Price 16996.48 - .0555(Odometer) 16791.48 - .0555(Odometer) 16701 - .0555(Odometer) Odometer Example: Auction Car Price The Regression Equation From JMP we get the regression equation PRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2) The equation for a silver color car. Price = 16701 - .0555(Odometer) + 90.48(0) + 295.48(1) The equation for a white color car. Price = 16701 - .0555(Odometer) + 90.48(1) + 295.48(0) Price = 6350 - .0278(Odometer) + 45.2(0) + 148(0) The equation for an “other color” car.
Example: Auction Car Price The Regression Equation From JMP we get the regression equation PRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2) For one additional mile the auction price decreases by 5.55 cents. A white car sells, on the average, for $90.48 more than a car of the “Other color” category A silver color car sells, on the average, for $295.48 more than a car of the “Other color” category.
There is insufficient evidence to infer that a white color car and a car of “other color” sell for a different auction price. There is sufficient evidence to infer that a silver color car sells for a larger price than a car of the “other color” category. Example: Auction Car Price The Regression Equation Xm18-02b
Shorthand Notation for Nominal Variables • Shorthand Notation for regression model with Nominal Variables. Use all capital letters for nominal variables • Parallel Regression Lines model: • Separate Regression Lines model:
Nominal Variables in JMP It is not necessary to create indicator variables yourself to represent a nominal variable. Nominal variables in JMP: • Make sure that the nominal variable’s modeling type is in fact nominal. • Include the nominal variable in the Construct Model Effects box in Fit Model • JMP will create indicator variables. The brackets indicate the category of the nominal variable for which the indicator variable is 1. • JMP will leave out the level which is highest alphabetically or numerically.
Specially Constructed Explanatory Variables • Types of specially constructed explanatory variables: • Powers of variables • Products of variables (interactions) • Indicator variables to represent nominal variables • Transformations of variables (e.g., log) • Use matrix of pairwise scatterplots to initially examine the data and look for needed transformations, powers of variables.
Inference for Multiple Regression • Chapter 10.2 • Tests for single coefficients • Confidence intervals for single coefficients • Confidence intervals for mean response at • Prediction intervals for • Chapter 10.3 • F-test for overall significance of regression • F-test for joint significance of several terms (will not cover)
Case Study 10.1.2 • Question: Do echolocating bats expend more energy than nonecholocating bats after accounting for body size? • Data: Body mass and flight energy expenditure for 4 nonecholocating bats, 12 non-echolocating birds and 4 echolocating bats. • Strategy: Build a multiple regression model for mean energy expended as a function of type of flying vertebrate (echolocating bat, nonecholocating bat, nonecholocating bird) and body size . • Explore (resolve need for transformation) • Test for interaction • If no interaction, answer question with the three parallel lines model
Coded Scatterplots • To construct a coded scatterplot, create columns energy nonecholocating bat, energy nonecholocating bird and energy echolocating bat. The column energy nonecholocating bat should contain only the energies for nonecholocating bats and a blank for all other species. • Click graph, overlay plot, put energy nonecholocating bat, energy nonecholocating bird and energy echolocating bat in Y and mass in X.
Separate/Parallel Regression Lines Model • Separate regression lines model: • Parallel regression lines model:
Inferences for Echolocating Bats • Is the parallel regression lines model appropriate? Test and • There is no evidence against the parallel regression lines model so we go ahead and use it to answer the question of interest – do echolocating bats use less energy than nonecholating bats of the same body size ( ) and nonecholocating birds of the same body size.( )
Inferences for Echolocating Bats Cont. • No strong evidence that echolocating bats use less energy than either nonecholocating bats (p-value = 0.35) or nonecholocating birds (p-value = 0.77) of same body size. • 95% Confidence interval for difference in mean of log energy for nonecholocating bats and echolocating bats of same body size: (-0.51,0.35). • This means that 95% confidence interval for ratio of median energy for nonecholocating bats and echolocating bats of same body size is • Summary of findings: Although there is no strong evidence that echolocating bats use less energy than nonecholocating bats of same body size, it is still plausible that they use quite a less bit energy (60% as much at the median). Study is inconclusive.
Prediction Intervals • To find a 95% prediction interval for the mean log energy of a flying vertebrate of a given type and mass, • Fit the multiple regression model • Click red triangle next to response log energy, click save columns, click predicted values and also click indiv confid interval. This saves the predicted values, lower 95% prediction interval endpoint and upper 95% prediction interval endpoint for each observation in data set. • To get prediction interval for X’s that are not in the data set, enter a row with those X’s and then exclude the observation.