150 likes | 218 Views
Multiple Linear Regresion: Further issues and anova results. (Session 07). Learning Objectives. At the end of this session, you will be able to appreciate requirements and limitations of variables used in a multiple regression
E N D
Multiple Linear Regresion: Further issues and anova results (Session 07)
Learning Objectives At the end of this session, you will be able to • appreciate requirements and limitations of variables used in a multiple regression • recognise the dependence of anova results on the order of fitting variables • interpret results of anova results when terms are fitted sequentially • understand the difference between interpretation of t-probabilities and anova F-probabilities when there are 2 or more x’s.
The “crimes” example again! Recall that in the example on relating number of acts regarded as crimes to age, college years and parents’ income, the college variable was non-significant. Although a quantitative variable, college had only 3 possible values! This is NOT a problem since “college” is an x variable, and there were many observations at each of these values. It is a problem if the y-variable had only a few distinct values – normality assumption is then violated.
Points to note about the variables • In the regression analyses so far considered, • the y-variable is a quantitative measurement, assumed to have an approximate normal distribution. • The x-variables are quantitative variates, each contributing 1 d.f. to the model. • However, some x’s could be categorical • factors, each contributing • d.f.=number of levels -1 to the model. • The latter case will be discussed later!
But – care is sometimes needed… If an x-variable has only a few values, pay attention to the number of observations for each. In practical 6, variable empl was highly significant (p=0.006) The residual plot looked OK, apart from one outlier (where just 1 HH had 3 employed members). But… will empl remain significant if the outlier was removed?
Results after deleting outlier ----------------------------------------- lnexpdf| Coef. Std. Err. t P>|t| -------+--------------------------------- hhsize| -.06194 .03031 -2.04 0.047 empl| .23483 .28690 0.82 0.418 const.| 9.2177 .16843 54.73 0.000 ----------------------------------------- Note that empl is now non-significant! Dangerous to use a model where conclusions depend on just 1 observation!
ANOVA for 2-variables (sequential) We return again to the “crimes” example to show the effect of the order of fitting terms. ---------+------------------------------------- Source | df Seq.SS MS F Prob>F ---------+------------------------------------- age | 1 92.676 92.676 3.20 0.0808 college | 1 263.387 263.387 9.10 0.0043 Residual | 42 1216.248 28.958 ---------+------------------------------------- Total | 1572.311 44 35.734 ---------+------------------------------------- Here, age is fitted first, then college, hence F-probs need to be interpreted accordingly.
ANOVA for 2-variables (sequential) Consider now the anova with the order of fitting terms changed… ---------+------------------------------------- Source | df Seq.SS MS F Prob>F ---------+------------------------------------- college | 1 2.780 2.781 0.10 0.7582 age | 1 353.282 353.282 12.20 0.0011 Residual | 42 1216.248 28.958 ---------+------------------------------------- Total | 1572.311 44 35.734 ---------+------------------------------------- Here, college is fitted first, then age. Note change in F-probs from previous slide. Why is this?
Discussion… What is the same and what is different aross slides 7 and 8 above? Order of fitting seems to matter! What do the results mean? How do the F-probs from above and the t-probs below for model estimates compare? ----------------------------- crimes | Coef. P>|t| --------+-------------------- age | 1.30876 0.001 college | -6.448684 0.004 const. | 2.324681 0.590 -----------------------------
Exercise: 2nd example: Q2, Pract. 6 Open penrain.dta from Q2 of previous practical. Note down anova results below from a regression of rain on elevation, then altitude. Interpretation of F-probs:
Changing order of fitting: Now fit altitude, then elevation. Note down the results below. Interpretation of F-probs:
Model parameter estimates: Finally, note down the parameter estimates and the corresponding t-probabilities: Overall conclusions:
Adjusted sums of squares Some software packages present adjusted sums of squares, taking results from anova tables in slides 10 and 11 into one single anova: Note that the sums of squares now do not add to the total S.S. What do the F-probabilities now represent?
Key Points • Recognise the type of variable (y) being modelled. Methods discussed apply when y is quantitative • The explanatory variables (the x’s) can be variables of any type – but so far we have only considered quantitative x’s • Take care when interpreting anova F-probs to check whether the sums of squares are sequential or adjusted • Note that all t-probabilities (associated with the parameter estimates) are adjusted for all other terms in the model
Practical work follows to ensure learning objectives are achieved…