210 likes | 244 Views
Summarizing and Presenting Data. Statistics. I’m a biologist! Why am I doing math?!. We need math and statistics to answer questions like: Does this medicine work? Do no-fishing areas increase overall reef fish populations?
E N D
Statistics I’m a biologist! Why am I doing math?! We need math and statistics to answer questions like: • Does this medicine work? • Do no-fishing areas increase overall reef fish populations? • Are Drosophila on the Big Island genetically distinct from Drosophila on O’ahu?
Different ways of answering a question – which is better? • Does this medicine work? • Mrs. Hayashi said she felt better after taking drug A. • 90% of patients who took drug A lived 10 years longer than those on drug B. • Do no-fishing areas increase overall reef fish populations? • My uncle caught nothing when he went fishing next to the reserve yesterday. • Average size and abundance of commercial fish increased in the years following establishment of a reserve. • Are Drosophila on the Big Island genetically distinct from Drosophila on O’ahu? • They look kinda different • There are significant base pair differences in the cytochrome oxidase A gene locus
Statistics The scientific study of numerical data based upon variation in nature.
Two types of Statistics Descriptive Statistics: • used to describe/summarize data set • e.g., mean, median, range, variance, standard deviation, confidence intervals, frequency distribution, etc. Inferential Statistics: • It allows you to make inferences about a population from a sample. • It’s used to test the Ho. • Ex. t-test, ANOVA, ANCOVA
Some Definitions • Raw Data: the data you collect directly from the organisms or environment you are studying. • Population (N):the total # of individuals in the population of interest. • Sample (n):the number of observations or individuals measured. n = 3 • Variable: any factor that can be controlled, changed, or measured in an experiment.
Descriptive Statistics • Statistics of Location • Average (Mean) • Median • Mode • Statistics of Dispersion • Range • Standard Deviation
Fern Height (m) 1.2 3.0 0.5 2.3 1.5 Mean: _ X = 1.2+3.0+0.5+2.3+1.5 5 = 1.7 m Middle number in a data set. Order from smallest to largest. 0.5 1.2 1.5 2.3 3.0 Median: Difference between the largest and smallest data points in a sample. 3.0 - 0.5 = 2.5 m Range: Hand calculations
Sample Variance n = 5 _ X = 1.7 _ _ X X-X (X-X)2 0.5 -1.2 1.44 1.2 -0.5 0.25 1.5 -0.2 0.04 2.3 0.6 0.36 3.0 1.3 1.69 3.78 3.78 4 S2 = = 0.945 Standard Deviation S = 0.945 = 0.972
TABLE I The Effects of High Fat Versus Low Fat Diets on Rat Weights and Lengths AVERAGE AVERAGE DIET WEIGHT LENGTH TYPE (g) (cm) high fat 545.5 55.3 low fat 346.7 53.2 Presenting Tables
r = +1 r = -1 r = 0 Correlation & Regression Correlation: describe the strength of association between two variables. The correlation coefficient is designated as r. r can range from -1 to +1. +1 = a strong positive correlation 0 = no correlation -1 = a strong negative correlation
Presenting Graphs The independent variable is always plotted on the x-axis Does the weight cause the length or the length cause the weight?
This relationship can be represented by a regression equation. Equation for a straight line: y = bx + ay = DV, b = slope, x = IV, a = intercept Simple Linear Regression: used to detect a linear relationship between variables when we think x affects y (but not the reverse).
Simple Linear Regression: detect a linear relationship between variables when x affects y. RegressionEquation for a straight line: y = bx + a y = Dependent variable, b = slope, x = Independent variable, a = intercept e.g. y = height, x = age -> Tree height = b*(tree age) + a How do we calculate a + b?
The p-value (since its less than 0.05) says that the slope of our line is significantly different from zero. Therefore, there’s a significant effect of age on height. = p-value intercept = a slope = b EXCEL output:
Tasks for Today’s Lab • Calculate averages and standard deviations of your seed pod lengths, weights, and number of seeds per pod (use only the units originally measured, not converted units). • Do the same for pooled class data (will need Excel files of raw data from you). • Create a single table that presents the values (both your data & the pooled class data - include only the averages and SDs in this table).
Tasks for Today’s Lab • Using only the class pooled data, prepare a frequency histogram for the number of seeds per pod. • Graph the relationship between pod weight and pod length. • Graph the relationship between number of seeds per pod and pod length. • Interpret that data.