1 / 16

Stat E-150 Statistical Methods

Stat E-150 Statistical Methods. Section 3 Feb 26, 2014. Outline for today. Comments on Homework 2 Write down the model; Write down Ho and H1; give the test statistics; give the p values Multiple Linear Regression Interpretation of coefficients in multiple liner regression

loman
Download Presentation

Stat E-150 Statistical Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stat E-150 Statistical Methods Section 3 Feb 26, 2014

  2. Outline for today • Comments on Homework 2 • Write down the model; • Write down Ho and H1; give the test statistics; give the p values • Multiple Linear Regression • Interpretation of coefficients in multiple liner regression • Quantitative variables • Categorical (binary) variables • Interactions • With quantitative variables • With categorical (binary) variable • Higher order terms (polynomial) • Multicollinearity • How to detect? • Model Building • Homework 3 Q & A

  3. Multiple regression • Y = β0 + β1x1 + β2x2 +  + βkxk+ ε • Essentially the same as simple linear regression, but with more than one predictor. • Can include categorical predictors • Can look at several different quantitative predictors • Can also have higher order termsand interactions

  4. Multiple Linear Regression – Quantitative Predictors • Hypothesis test now differs from individual beta test H0: β1 = β2 = β3 =  = βk = 0 Ha: The slopes are not all zero – or at least one slope is not zero • Generically: • H0: βi = 0 • Ha: βi≠0 • Interpretation: • βi is interpreted given that the other covariates are in the model. Conditional interpretation.

  5. Multiple Linear Regression - Categorical predictors • Coded as indicator (dummy) variables. • They are coded using 0 and 1, where 0 = "no" 1 = "yes"

  6. Multiple Linear Regression – Interactions and Quadratic forms • Interaction with quantitative variables • Rarely seen • Hard to interpret • Interaction between quantitative and categorical variables • Interaction between categorical variables • Quadratic forms

  7. Multiple Linear Regression – Multicollinarity • When some of your predictor variables are correlated • It’s a problem because then they are ‘fighting’ over similar variability in response variable • Redundant • Can make results weird….. • How to detect? • Scatter plot matrix • Correlation matrix • Variance Inflation Factor (VIF)

  8. Multiple Linear Regression – Multicollinarity detection • Puts numbers to graphs, only shows predictors

  9. Multiple Linear Regression – Multicollinarity detection (VIF) • Helpful for more subtle dependence among predictors, such as when some combination of predictors taken together are strongly related to another predictor. • VIF reflects the assocation between a predictor and ALL the other predictors • VIF > 5 can be an indicator of an issue, > 10, very concerning • Tolerance is the inverse of VIF

  10. Model building: how to choose predictors • Sometimes you might have many variables to choose from in building a model. • Pretty much does the same process as we did by hand • Finds the variable with the largest correlation (R2) • Then tries each additional predictor, looking for the one that results in the largest increase in R2. • If that variable isn’t significant (α=.05), it stops • Otherwise keeps going • One caveat, sometimes when additional predictors are added, ones that were significant in the beginning might become nonsig. (redundant), so keeps an eye out for this • Will delete any variables that become nonsig

  11. Use model building with caution! • Based on overall model goodness • Doesn’t know your question of interest • Isn’t checking residual plots and such • Usually only done with first order terms – no interactions or quadratic terms • Need to think about whether should include some of these after you run stepwise • SPSS can’t tell what makes ‘sense’

  12. 3.18 Researchers collected samples of female lake trout from Lake Ontario in September and November 2002 – 2004. A goal of the study was to investigate the fertility of tish that had been stocked in the lake. One measure of the viability of fish eggs is percent dry mass (PctDM), which reflects the energy potential stored in the eggs by recording the percentage of the total egg material that is solid. Values of the PctDM for a sample of 35 lake trout (14 in September and 21 in November) are given in the file FishEggs.

  13. Write down an equation for the least squares line and comment on what it appears to dindicate about the relationship between PctDM and Age • What percentage of the variability in PctDM does Age explain for these fish? • Is there evidence that the relationship in (a) is statistically significant? Explain. • Produce a plot of the residuals versus the fits for the simple linear model. Does there appear to be any regular pattern?

  14. Now fit a multiple regression model, using an indicator (Sept) for the month and interaction product, to compare the regression lines for September and Nov. • Do you need both terms for a difference in intercepts and slopes? If not, delete any terms that aren’t needed and run the new model • What percentage of the variability in PctDM does the model in (f) explain for these fish?

  15. 3.28 this question is about a novel approach to examine calcium-binding proteins. The data from one experiment are provided in Fluorescence. The variable Calcium is the log of the free calcium concentration and ProteinProp is the proporttion of protein bound to calcium. • Fit a quadratic regression model for predicting ProteinProp from Calcium. Write down the fitted regression equation. • Add the quadratic curve to a scatterplot of ProteinProp versus Calcium.

  16. Are the conditions for inference reasonably satisfied for this model? • Is the parameter fore the quadratic term significantly different from zero? Explain. • Identify the doefficient of multiple determination and interpret this value.

More Related