250 likes | 357 Views
ARE OBSERVATIONS OBTAINED DIFFERENT ?. ARE OBSERVATIONS OBTAINED DIFFERENT ?. You use different statistical tests for different problems. We will examine some basic tests ( χ 2 , t-test, Regression, ANOVA, ANCOVA, χ 2 ) We expect you to use these basic tests in your research.
E N D
ARE OBSERVATIONS OBTAINED DIFFERENT?
ARE OBSERVATIONS OBTAINED DIFFERENT? • You use different statistical tests for different problems. • We will examine some basic tests (χ2, t-test, Regression, • ANOVA, ANCOVA, χ2) • We expect you to use these basic tests in your research. • Your research project should not be so complicated that • more advanced tests are required. • Always state your hypothesis – what you are testing.
Toss a coin 100 times Frequency Proportion of heads BASIC PREMISE OF STATISTICAL TESTING: Null Hypothesis: The coin is fair A fair coin: x = 50 heads sd = 5 heads (√(½ x ½ x 100)) You observe 60 heads. Is the coin fair? sd away from mean = (60 – 50)/5 = 2 sd 2 sd is 5% chance, but in one direction so 2.5% chance (5%/2) NULL HYPOTHESIS What if you set the probability to claim it to be unfair to be 5%? What if you set the probability to claim it to be unfair to be 25%?
STRATA #1 #2 #3 SPECIES RED NOT RED strata 8 1 1 10 2 10 12 24 10 11 13 34 #1 #2 #3 (O – E)2 2 = E 8.71 + 3.63 +1.55 + .65 + 2.08 + .87 = 17.49 = RED 1/11 RED 1/13 RED 8/10 NONPARAMETRIC TESTS: (data does not have to be normally distributed) Data must be counts and you test proportional distribution of counts. Null hypothesis: no difference in proportion of red among strata 2 CONTINGENCY TABLE: (2.94 ) (3.24) ( 3.82) (7.06) (7.76) ( 9.18) Expected for each cell = (R x C)/TOTAL P < 0.001; df = (r-1)(c-1) = 2
2 CONTINGENCY TABLE: Make a spreadsheet with table categories and counts in each, and then have MYSTAT use as frequencies (Data … Case weighting … By frequencies) Depending on table, use One-way frequency tables (one category – e.g., tree type) or Tables (more than one category – e.g., tree type and strata) in Analyze in MYSTAT
PARAMETRIC TESTS (data is normally distributed) Frequency T-TEST: strata #1 #2 #3 Proportion red (x1 - x2)√n1n2/(n1 + n2) t = √[(n1 – 1)s12 + (n2 – 1)s22]/(n1+ n2 – 2) RED 0.08 + 0.17 RED 0.08 + 0.17 RED 0.79 + 0.25 Data do not have to be counts. Easier to see differences (more powerful) than nonparametric statistics. Null hypothesis: no difference in proportion of red between strata #1 and #2. t = [(0.71)(1.41)]/.214 = 4.68 P < 0.005, degrees of freedom = 6
T-TEST: Use Hypothesis testing in Analyze in MYSTAT for means
PARAMETRIC TESTS (data is normally distributed) strata #1 #2 #3 RED 0.08 + 0.17 RED 0.08 + 0.17 RED 0.79 + 0.25 Data do not have to be counts. Easier to see differences (more powerful) than nonparametric statistics. Null hypothesis: no difference in relative abundance of red between strata #1 and #2 for matched plots based on similarity. EVEN MORE POWERFUL IF A PRIORI BASIS TO PAIR OBSERVATIONS. PAIRED T-TEST: Pairs: 0.5 – 0 = 0.5; 1.0 – 0 = 1.0; 1.0 - 0.33 = 0.67; 0.67 – 0 = 0.67 mean = 0.71, sd = 0.21 t = 0.71/(0.21/√4) = 6.76 P < 0.001, degrees of freedom = n-1 = 3
PARAMETRIC TESTS (data is normally distributed) strata #1 #2 #3 RED 0.08 + 0.17 RED 0.08 + 0.17 RED 0.79 + 0.25 Data do not have to be counts. Easier to see differences (more powerful) than nonparametric statistics. Null hypothesis: no difference in absolute abundance of red between strata #1 and #2. Now use numbers not proportions. T-TEST: Strata #1: mean = 2.0, sd = 0.82, n = 4 Strata #2: mean = 0.25, sd = 0.5, n =4 t = [(2 – 0.25)(1.41)]/ 0.68 = 3.63 P < 0.01, degrees of freedom = 6
REGRESSION ANALYSIS: strata #1 #2 #3 5 RED 0.08 + 0.17 RED 0.08 + 0.17 RED 0.79 + 0.25 STATISTICAL TESTS Null hypothesis: there is no relationship between red vs. blue + green in plots. RED = 2.33 – 0.75(BLUE or GREEN) r2 = 0.75, r = -0.88 Degrees of freedom = 12 – 2 = 10 P < 0.001
REGRESSION ANALYSIS: Use Regression … Linear … Least squares in Analyze in MYSTAT Select dependent (y) and independent (x) variables
PARAMETRIC TESTS (data is normally distributed) strata #1 #2 #3 RED 0.08 + 0.17 RED 0.08 + 0.17 RED 0.79 + 0.25 WHAT IF MULTIPLE COMPARISONS OF A CATEGORY (ANOVA) Null hypothesis: no difference in relative abundance of red among all strata. Three possible t-test comparisons: #1 vs. #2 #1 vs. #3 #2 vs. #3 PROBLEM: As number of comparisons increases, the likelihood of finding at least one significant difference by chance increases. ANOVA takes this into account to compare differences in mean values. 1-WAY ANOVA: F = 19.75 df = 2, 9 (strata -1, samples – strata) p < 0.001
ANOVA: Use Analysis of variance … Estimate model in Analyze in MYSTAT Select continuous dependent (y) variable and categorical independent (x) variables
strata #1 #2 #3 RED 0.08 + 0.17 RED 0.08 + 0.17 RED 0.79 + 0.25 MULTIPLE COMPARISONS (ANOVA): (Which specific differences are significant?) Post –hoc analysis: Must compensate for number of comparisons and the fact that a difference is already known to be significant. Bonferroni test: (t-test adjusted for # of comparisons) #1 vs. #2 – p < 0.001 #1 vs. #3 – p < 0.001 #2 vs. #3 – p < 1.0
ANOVA – POST HOC: (cannot do with MYSTAT, but will with SYSTAT) Use Analysis of variance … Estimate model … Hypothesis test in Analyze in SYSTAT
strata #1 #2 #3 RED 0.08 + 0.17 RED 0.08 + 0.17 RED 0.79 + 0.25 MULTIPLE COMPARISONS (ANOVA): (several independent categorical variables) Null hypothesis: no difference in relative abundance of red between strata and with distance into the woods. TWO-WAY ANOVA: Strata: F = 15.65; df = 2,6; p < 0.001 Distance: F = 0.12; df = 1,6; p < 0.74 Strata X Distance Interaction: F = 0.51; df = 2,6; p < 0.63 near DISTANCE FROM EDGE far COULD HAVE N-WAY ANOVA, YOUR PROJECT SHOULD NOT EXCEED A 2-WAY.
THE INTERACTION TERM’S MEANING (no variety) LOCATION NO MAIN EFFECTS (SEASON or LOCATION – no differences) INTERACTION IS SIGNIFICANT (greatest at A:III and C:I)
THE INTERACTION TERM’S MEANING (wider variety) LOCATION MAIN EFFECTS (SEASON or LOCATION -- differences) NO INTERACTION (highest always in C and III)
strata #1 #2 #3 RED 0.08 + 0.17 RED 0.08 + 0.17 RED 0.79 + 0.25 MULTIPLE COMPARISONS (ANCOVA): (several independent variables: one categorical and one continuous) Null hypothesis: no difference in relative abundance of red with blue + green and distance into the woods (assume equal slopes). near ANCOVA: Blue + Green: F = 36.10; df = 1,9; p < 0.0002 Distance: F = 0.78; df = 1,9; p < 0.40 Interaction (slope): F = 0.08; df = 1,8; p < 0.08 DISTANCE FROM EDGE far COULD HAVE N-WAY ANCOVA,
ANCOVA: Use Analysis of variance … Estimate model in Analyze in MYSTAT. In SYSTAT use General linear model … Estimate model in Analyze Select continuous dependent (y) variable and categorical independent (x1) variable and covariate (x2). In SYSTAT, create interaction term to test slope.
DATA TRANSFORMATIONS (can normalize data or make it continuous so parametric statistics can be used, or make data linear for regression) • Data are not always normally distributed, • but a transformation may make it normal (e.g., log). If it cannot be • normalized then must use non-parametric statistics (less powerful). • Data are not always continuous, • percentages or proportions are not continuous because they cannot • be less than 0 or greater than 100 or 1. To make them continuous • from 0 to infinity or –infinity to +infinity, you can use transforms: • arcsine transform = arcsinproportion; • logarithmic transform = log(proportion)* • logit transform = log (proportion/1-proportion)*. • This stretches both tails and compresses the peak to approximate • a continuous normal distribution. • * If some proportions = 0 or 1, then add a small constant to all values (e.g, 0.001) • Data for regression are not always linear, • various transformations, especially log x, log y or both, can • transform a curve into a straight line. What do logarithmic transforms • imply about the linear function?
DATA TRANSFORMATIONS Use Data … Transform … Let in MYSTAT.
ARE OBSERVATIONS OBTAINED DIFFERENT? • Different statistical tests for different problems. • You will use these basic tests in your research (χ2, t-test, • Regression, ANOVA, ANCOVA) • Your research project should not be so complicated that • more advanced tests are required. • Always graph your data and state your hypothesis.
Meadow vole (Microtus pennsylvanicus) Yellowbellied marmot (Marmota flaviventris) UNDERC-WEST (National Bison Range) USE MYSTAT WITH DATA FILES CREATED LAST WEEK (be sure to set 6 decimal places -- Edit … Options … Output in MYSTAT so p values are exact)
WITH MYSTAT ANSWER THESE QUESTIONS: (you will use χ2, regression, t-test, 2-way ANOVA, ANCOVA) • Does snap-trapping lead to a sex bias in Microtus? • What is the relationship between length and mass for Microtus? • (hint: need to use Data … Transform … Let) • Do Microtus and Marmota exhibit similar length and mass growth relationships? • (hint: think about question above) • Does Marmota mass vary with month? Explain ecologically what you see. • Does reproductive status of female Microtus differ with mass? Why do you • observe this? (hint: need to use Data … Select cases) • Does the reproductive status of male and female Microtus with mass differ? • Due in two weeks!