GG313 Lecture 19 Nov 1 2005 Regression Summary and Anova Test

GG313 Lecture 19 Nov 1 2005 Regression Summary and Anova Test

What are the major linear regression (line fitting) algorithms and their properties? algorithm minimizes LSY: Least-squares y on x 2: LSY with errors in y LSXY: Complete orthogonal RMA: Reduced Major Axis LMS: Least median of squares (Robust) median of squares

algorithm uses LSY: Least-squares y on x normal data, not steep 2: LSY with errors in y data with known errors LSXY: Complete orthogonal normal data, all slopes RMA: Reduced Major Axis normal data, all slopes LMS: Least median of squares (Robust) bad outliers

algorithm poor results LSY: Least-squares y on x outliers, steep 2: LSY with errors in y ----- LSXY: Complete orthogonal outliers RMA: Reduced Major Axis outliers LMS: Least median of squares (Robust) heavy groupings

LMS Line -groups LSY Line -groups LMS Line -outliers LSY Line -outliers

ANOVA TEST The anova (analysis of variance) test is used to tetermine whether MANY samples come from the same population. Earlier we tested to see whether two samples were from the same population using the t-test; anova is used in a similar way to test the means of many samples using the f-test. We place the sample values in the columns of a matrix, so we have n observations in each sample, and each row represents one of k samples:

What might these values be: • • densities at different depths in wells • • fossil measurements at different sites • • color values in a photograph • • manganese crust thickness vs water depth at different sites • As previously, we set up a hypothesis that suggests that the values at each site are different, and a null hypothesis that the values are the same. At our standard confidence level of 95%, we will see if the null hypothesis can be rejected, implying that the values at the sites are not from the same population.

The anova test ASSUMES that 1) the populations are normally distributed 2) the populations have the same variance (2) The test involves the calculation of several parameters: (4.65,6) SS(Tr) (treatment of sum of squares) is a measure of the variation of the sample means and SSE (error sum of squares) is a measure of the variation within samples

The F-test statistic is then given by: (4.68) The value of F will vary from zero to large values. If it is close to zero, then the null hypothesis is likely. If the F value above is large, the null hypothesis is unlikely. We obtain our F comparison value (critical value) from the F-table or from Matlab using the level of confidence we want (usually 95% or 5% depending on the table) and the degrees of freedom given by k-1 (the number of samples - 1) and k*(n-1), the total number of observations minus the number of samples. EXAMPLE: ANOVAEX.html

2-WAY anova 2-way anova is the same as 1-way anova explained above except that not only are the columns compared - the means of the samples - but also the rows. For example, the means of the samples may be different, but the means of the rows may be the statistically the same. Such as where the densities change in a similar way with depth. anova2 generates some new statistics: SSB and a new SSE:

We then get two F-values, one for the samples (treatments) and one for the rows: Fcolumns=MS(Tr)/MSE and Frows=MSB/MSE The critical F-values are then calculated using k-1 and (k-1)*(n-1) degrees of freedom for Fcolumns, and n-1 and (k-1)*(n-1) degrees of freedom for Frows. We then check the critical F-values against the observed F-values and reject the null hypothesis if the critical F-value is larger than the observed. EXAMPLE: ANOVA2EX.html

GG313 Lecture 19 Nov 1 2005 Regression Summary and Anova Test