350 likes | 451 Views
N-way ANOVA. 3-way ANOVA. 3-way ANOVA. H 0 : The mean respiratory rate is the same for all species H 0 : The mean respiratory rate is the same for all temperatures H 0 : The mean respiratory rate is the same for both sexes H 0 : The mean respiratory rate is the same for all species
E N D
3-way ANOVA H0: The mean respiratory rate is the same for all species H0: The mean respiratory rate is the same for all temperatures H0: The mean respiratory rate is the same for both sexes H0: The mean respiratory rate is the same for all species H0: There is no interaction between species and temperature across both sexes H0: There is no interaction between species and sexes across temperature H0: There is no interaction between sexes and temperature across both spices H0: There is no interaction between species, temperature, and sexes
What is what? • Regression: One variable is considered dependent on the other(s) • Correlation: No variables are considered dependent on the other(s) • Multiple regression: More than one independent variable • Linear regression: The independent factor is scalar and linearly dependent on the independent factor(s) • Logistic regression: The independent factor is categorical (hopefully only two levels) and follows a s-shaped relation.
Remember the simple linear regression? If Y is linaery dependent on X, simple linear regression is used: is the intercept, the value of Y when X = 0 is the slope, the rate in which Y increases when X increases
Multiple linear regression If Y is linaery dependent on more than one independent variable: is the intercept, the value of Y when X1 and X2 = 0 1 and 2are termed partial regression coefficients 1 expresses the change of Y for one unit of X when 2 is kept constant
Multiple linear regression – residual error and estimations As the collected data is not expected to fall in a plane an error term must be added The error term summes up to be zero. Estimating the dependent factor and the population parameters:
Multiple linear regression – general equations In general an finitenumber (m) of independent variables maybeused to estimate the hyperplane The number of sample points must betwo more than the number of variables
Multiple linear regression – least sum of squares The principle of the least sum of squaresareusuallyused to perform the fit:
Multiple linear regression – Areany of the coefficientssignificant? F = regression MS / residual MS
Multiple linear regression – Is it a good fit? • R2 = 1-regression SS / total SS • Is an expression of how much of the variation can be described by the model • When comparing models with different numbers of variables the ajusted R-square should be used: • Ra2 = 1 – regression MS / total MS • The multiple regression coefficient: • R = sqrt(R2) • The standard error of the estimate = sqrt(residual MS)
Multiple linear regression – Which of the coefficient are significant? • sbi is the standard error of the regresion parameter bi • t-test tests if bi is different from 0 • t = bi / sbi • is the residual DF • p values can be found in a table
Multiple linear regression – Which of the are most important? • The standardized regression coefficient , b’ is a normalized version of b
Multiple linear regression - multicollinearity • If two factors are well correlated the estimated b’s becomes inaccurate. • Collinearity, intercorrelation, nonorthogonality, illconditioning • Tolerance or variance inflation factors can be computed • Extreme correlation is called singularity and on of the correlated variables must be removed.
Multiple linear regression – Pairvisecorrelationcoefficients
Multiple linear regression – Assumptions The same as for simple linear regression: Y’s are randomly sampled The reciduals are normal distributed The recidualshav equal variance The X’s are fixed factors (their error are small). The X’s are not perfectly correlated
Logistic Regression • If the dependent variable is categorical and especially binary? • Use some interpolation method • Linear regression cannot help us.
The sigmodal curve • The intercept basically just ‘scale’ the input variable
The sigmodal curve • The intercept basically just ‘scale’ the input variable • Large regression coefficient → risk factor strongly influences the probability
The sigmodal curve • The intercept basically just ‘scale’ the input variable • Large regression coefficient → risk factor strongly influences the probability • Positive regression coefficient→risk factor increases the probability • Logisticregessionusesmaximumlikelihoodestimation, not leastsquareestimation
Does age influence the diagnosis? Continuous independent variable
Does previous intake of OCP influence the diagnosis? Categorical independent variable
Predicting the diagnosis by logistic regression What is the probabilitythat the tumor of a 50 yearoldwomanwho has beenusing OCP and has a BMI of 26 is malignant? z = -6.974 + 0.123*50 + 0.083*26 + 0.28*1 = 1.6140 p = 1/(1+e-1.6140) = 0.8340
Exercises 20.1, 20.2
Exercises 14.1, 14.2