150 likes | 170 Views
This lecture covers multiple regression analysis in the context of case studies and interpretation of regression coefficients. Topics include the estimation of multiple linear regression models, JMP output, and the application of regression analysis in various scenarios.
E N D
Lecture 20 – Tues., Nov. 18th • Multiple Regression: • Case Studies: Chapter 9.1 • Regression Coefficients in the Multiple Linear Regression Model: Chapter 9.2 • JMP Output: Chapter 9.6.1 • Office Hours: Today and Thursday after class, tomorrow (Wednesday) 11-12 instead of 1:30-2:30 or by appointment.
Multiple Regression • Multiple Regression: Seeks to estimate the mean of Y given multiple explanatory variables X1,…,Xp, denoted by • Examples: • Y=1st year GPA, X1=Math SAT, X2=Verbal SAT, X3=High School GPA • Y=Sales price of house, X1=Square Footage, X2=Number of Rooms • Uses of Regression Analysis • Describe association between mean of Y and X1,…,Xp; describe association between mean of Y and X1 after taking into account X2,…,Xp. • Passive Prediction: Predict Y based on X1,…,Xp. • Control: Predict what Y will be if you change X1,…,Xp.
Multiple Linear Regression Model • There is a normally distributed subpopulation of responses for each combination of the explanatory variables with • The observations are independent of one another.
Case Study 9.1.1 • Meadowfoam is a small plant found growing in moist meadows in Northwest. • Researchers conducted a randomized experiment to find out how to elevate meadowfoam production • In a controlled growth chamber, they focused on the effects of two-light related factors: light intensity and timing of onset of light treatment. • Light intensity levels: 150,300,450,600,750,900 • Timing of onset: Early, Late
Case Study 9.1.1. Cont. • Variables: • Y = average number of flowers per meadowfoam plant • X1=light intensity • X2=1 if late timing, 0 if early timing • Multiple Linear Regression Model:
Interpretation of Coefficients • = the change in the mean of y that is associated with a one unit increase in where is held fixed. • = the change in the mean of y that is associated with a one unit increase in where is held fixed. • = mean of y when
Coefficients in Meadowfoam Study • For meadowfoam study: • = change in mean flowers per plant associated with 1 increase in light intensity for fixed time of onset • = change in mean flowers per plant associated with switching from late to early onset for fixed light intensity.
Estimation of Multiple Linear Regression Model • The coefficients are estimated by choosing to make the sum of squared prediction errors as small as possible, i.e., choose to minimize • Predicted value of y given x1,…,xp: • = SD(Y|X1,…,Xp), estimated by = root mean square error
Multiple Linear Regression in JMP • Analyze, Fit Model • Put response variable in Y • Click on explanatory variables and then click Add under Construct Model Effects • Click Run Model.
Reading JMP Output • Estimated multiple linear regression model: • . Approximately 95% of flowers per plant will lie within 2*6.44 =12.88 flowers per plant of • p-values for coefficients indicate that there is strong evidence that higher light intensity is associated with less flowers per plant on average for fixed time onset and that early time onset is associated with more flowers per plant on average for fixed light intensity.
Case Study 9.1.2 • What characteristics are associated with bigger brain size after accounting for body size, i.e., what characteristics are associated with bigger brain size holding body size fixed? • Y=brain weight, X1=body weight, X2=gestation period, X3=litter size • Multiple Linear Regression Model
Interpretation in Randomized Experiments vs. Obs. Studies • Randomized Experiments: Interpretation of an “effect” of an explanatory variable is straightforward and causation is implied. Example: “A 1-unit increase in light intensity causes the mean number of flowers to increase by “ • Observational Studies: Cannot make causal conclusions from statistical association. “For any subpopulation of mammal species with the same body weight and litter size, a 1-day increase in the species’ gestation length is associated with a - gram increase in mean brain weight.” Interpretation is only useful if subpopulation of mammals with fixed values of body weight and litter size, but varying gestation lengths, exist.
Interpreting Coefficients • Interpretation depends on what other X’s are included. • measures rates of change in mean brain weight with changes in gestation length in population of all mammal species where body size is variable • measures the rate of change in mean brain weight with changes in gestation length within subpopulations of fixed body size.