350 likes | 358 Views
Learn about measures of central tendency (mean, median, mode) and measures of spread (variance, standard deviation, coefficient of variation) in summarizing and describing data. Understand how to use box plots to display data and when to use mean versus median. Also, explore the difference between standard deviation and interquartile range. Discover the practical applications of these statistical measures through examples in snake undulation rates and stickleback armoring.
E N D
Ch 3! Summarizing / describing data
Statistics to summarize data • Measures of central tendency • Measures of spread
Measures of location (or central tendency) • Mean • Median • Mode
Measures of location (or central tendency) • Mean • Median • Mode
Measures of location (or central tendency) Example: Y1=56, Y2=72, Y3=18, Y4=42 = (56+72+18+42) / 4 = 47 • Mean • Median • Mode
Median • (= middle measurement) • Take some data: 18 28 24 25 36 14 34 • Rank it: 14 18 24 25 28 34 36 • Middle number: 14 18 24 25 28 34 36 • Median is 25
Median • (= middle measurement) • Take some data: 18 28 24 25 36 14 34 • Rank it: 14 18 24 25 28 34 36 • Middle number: 14 18 24 25 28 34 36 • Median is 25 • Take some data: 18 28 24 25 36 14 • Rank it: 14 18 24 25 28 36 • Average middle numbers: 14 18 24 25 28 36 • Median is 49/2 = 24.5
Mode • most frequent number
Comparing mean and median • median is middle measurement • mean is center of gravity
Measures of spread • Variance • Standard deviation • Coefficient of variation • Interquartile range
Paradise tree snakes • Climb trees • Move from tree to tree by gliding • involves flinging self from top of tree • undulating to slow fall • undulation moves snake from initial tree • faster undulation rate should increase dispersal ability
Researchers measured undulation rates: Data: 0.9, 1.4, 1.2, 1.2, 1.3, 2.0, 1.4, 1.6
Variance and standard deviation • measures that depend on knowing the mean • going to measure deviation from sample mean • Yi - `Y
Variance & std dev 2 variance s is standard deviation
Standard deviation • measures spread of the data • just how far (on average) the data points are from the sample mean
Related: Coefficient of variation • is just s expressed relative to the mean: • Example: for snake undulation data, CV = 0.324/1.375 * 100% = 24% • higher CV => more variability
Related: Coefficient of variation • is just s expressed relative to the mean: • useful when you want to compare variability between datasets that a) are measured in different units, or b) have vastly different ranges: • variability of mouse weights vs. variability of elephant weights
Standard deviation is useful when: • Data are normally distributed (i.e., follow a bell curve) • If this is true: • 67% of observations will fall between `Y – s and `Y + s • 95% of observations will fall between `Y – 2s and `Y + 2s
Standard deviation is useful when: • Data are normally distributed (i.e., follow a bell curve) • If this is true: • 67% of observations will fall between `Y – s and `Y + s • 95% of observations will fall between `Y – 2s and `Y + 2s • If data not normally distributed: • will want to calculate interquartile range
Example for interquartile range: spider running speed • Tidarren spiders have large sexual dimorphism: males are tiny, relative to females (about 1% of female’s size) • Also: males have v large pedipalps (copulatory organs, derived from feet), for carrying sperm • sometimes, males voluntarily amputate one of the pedipalps
Researchers hypothesized that having only 1 pedipalp increased spider running speed
For interquartile range, must rank data, find median, then find midpoint of lower half of data, and upper half of data upper half of data lower half of data
For interquartile range, must rank data, find median, then find midpoint of lower half of data, and upper half of data upper half of data lower half of data
For interquartile range, must rank data, find median, then find midpoint of lower half of data, and upper half of data upper half of data lower half of data First quartile: (2.31 + 2.37) / 2 = 2.34 Third quartile: (3.00 + 3.09) / 2 = 3.045
For interquartile range, must rank data, find median, then find midpoint of lower half of data, and upper half of data upper half of data lower half of data First quartile: (2.31 + 2.37) / 2 = 2.34 Third quartile: (3.00 + 3.09) / 2 = 3.045 Interquartile range: 3.045 – 2.34 = 0.705
Box plot* to display median, interquartile range * or, box and whiskers plot
Box plot* to display median, interquartile range What is difference between histogram and box and whiskers plot? * or, box and whiskers plot
Box plot* to display median, interquartile range * or, box and whiskers plot
When to use mean vs. median? When to use standard deviation vs. interquartile range? • Motivating example: 3 spine sticklebacks • armoring varies between marine and freshwater forms • if predation low, useful to not have as much armor, because heavy • freshwater systems: armoring is less (commensurate with less buoyancy) • Researchers crossed marine and freshwater fish to see effect on armoring of offspring marine freshwater
One gene responsible for armoring • if fish receive 2 copies of gene from marine grandparent – call these MM fish – have heavy plating • if fish receive 2 copies of gene from freshwater grandparent – mm fish – have light plating • if fish receive 1 copy of gene each from marine and freshwater grandparents – Mm fish – have quite variable plating
Comparing mean and median • median is middle measurement • mean is center of gravity The mean is quite sensitive to extreme values – in cases of heavily skewed data, the mean will not represent data frequency accurately.
Effect of extreme values on the mean • Do thought experiment: take the smallest 4 values from the MM genotype, and move them far to the left.