1 / 35

Measures of Summary and Dispersion in Data Analysis

Learn about measures of central tendency (mean, median, mode) and measures of spread (variance, standard deviation, coefficient of variation) in summarizing and describing data. Understand how to use box plots to display data and when to use mean versus median. Also, explore the difference between standard deviation and interquartile range. Discover the practical applications of these statistical measures through examples in snake undulation rates and stickleback armoring.

mariafoster
Download Presentation

Measures of Summary and Dispersion in Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch 3! Summarizing / describing data

  2. Statistics to summarize data • Measures of central tendency • Measures of spread

  3. Measures of location (or central tendency) • Mean • Median • Mode

  4. Measures of location (or central tendency) • Mean • Median • Mode

  5. Measures of location (or central tendency) Example: Y1=56, Y2=72, Y3=18, Y4=42 = (56+72+18+42) / 4 = 47 • Mean • Median • Mode

  6. Median • (= middle measurement) • Take some data: 18 28 24 25 36 14 34 • Rank it: 14 18 24 25 28 34 36 • Middle number: 14 18 24 25 28 34 36 • Median is 25

  7. Median • (= middle measurement) • Take some data: 18 28 24 25 36 14 34 • Rank it: 14 18 24 25 28 34 36 • Middle number: 14 18 24 25 28 34 36 • Median is 25 • Take some data: 18 28 24 25 36 14 • Rank it: 14 18 24 25 28 36 • Average middle numbers: 14 18 24 25 28 36 • Median is 49/2 = 24.5

  8. Mode • most frequent number

  9. Comparing mean and median • median is middle measurement • mean is center of gravity

  10. Measures of spread • Variance • Standard deviation • Coefficient of variation • Interquartile range

  11. Paradise tree snakes • Climb trees • Move from tree to tree by gliding • involves flinging self from top of tree • undulating to slow fall • undulation moves snake from initial tree • faster undulation rate should increase dispersal ability

  12. Researchers measured undulation rates: Data: 0.9, 1.4, 1.2, 1.2, 1.3, 2.0, 1.4, 1.6

  13. Variance and standard deviation • measures that depend on knowing the mean • going to measure deviation from sample mean • Yi - `Y

  14. Variance & std dev 2 variance s is standard deviation

  15. Standard deviation • measures spread of the data • just how far (on average) the data points are from the sample mean

  16. Related: Coefficient of variation • is just s expressed relative to the mean: • Example: for snake undulation data, CV = 0.324/1.375 * 100% = 24% • higher CV => more variability

  17. Related: Coefficient of variation • is just s expressed relative to the mean: • useful when you want to compare variability between datasets that a) are measured in different units, or b) have vastly different ranges: • variability of mouse weights vs. variability of elephant weights

  18. Standard deviation is useful when: • Data are normally distributed (i.e., follow a bell curve) • If this is true: • 67% of observations will fall between `Y – s and `Y + s • 95% of observations will fall between `Y – 2s and `Y + 2s

  19. Standard deviation is useful when: • Data are normally distributed (i.e., follow a bell curve) • If this is true: • 67% of observations will fall between `Y – s and `Y + s • 95% of observations will fall between `Y – 2s and `Y + 2s • If data not normally distributed: • will want to calculate interquartile range

  20. Example for interquartile range: spider running speed • Tidarren spiders have large sexual dimorphism: males are tiny, relative to females (about 1% of female’s size) • Also: males have v large pedipalps (copulatory organs, derived from feet), for carrying sperm • sometimes, males voluntarily amputate one of the pedipalps

  21. Researchers hypothesized that having only 1 pedipalp increased spider running speed

  22. For interquartile range, must rank data, find median,

  23. For interquartile range, must rank data, find median, then find midpoint of lower half of data, and upper half of data upper half of data lower half of data

  24. For interquartile range, must rank data, find median, then find midpoint of lower half of data, and upper half of data upper half of data lower half of data

  25. For interquartile range, must rank data, find median, then find midpoint of lower half of data, and upper half of data upper half of data lower half of data First quartile: (2.31 + 2.37) / 2 = 2.34 Third quartile: (3.00 + 3.09) / 2 = 3.045

  26. For interquartile range, must rank data, find median, then find midpoint of lower half of data, and upper half of data upper half of data lower half of data First quartile: (2.31 + 2.37) / 2 = 2.34 Third quartile: (3.00 + 3.09) / 2 = 3.045 Interquartile range: 3.045 – 2.34 = 0.705

  27. Box plot* to display median, interquartile range * or, box and whiskers plot

  28. Box plot* to display median, interquartile range What is difference between histogram and box and whiskers plot? * or, box and whiskers plot

  29. Box plot* to display median, interquartile range * or, box and whiskers plot

  30. When to use mean vs. median? When to use standard deviation vs. interquartile range? • Motivating example: 3 spine sticklebacks • armoring varies between marine and freshwater forms • if predation low, useful to not have as much armor, because heavy • freshwater systems: armoring is less (commensurate with less buoyancy) • Researchers crossed marine and freshwater fish to see effect on armoring of offspring marine freshwater

  31. One gene responsible for armoring • if fish receive 2 copies of gene from marine grandparent – call these MM fish – have heavy plating • if fish receive 2 copies of gene from freshwater grandparent – mm fish – have light plating • if fish receive 1 copy of gene each from marine and freshwater grandparents – Mm fish – have quite variable plating

  32. Armoring of different genotypes

  33. Comparing mean and median • median is middle measurement • mean is center of gravity The mean is quite sensitive to extreme values – in cases of heavily skewed data, the mean will not represent data frequency accurately.

  34. Effect of extreme values on the mean • Do thought experiment: take the smallest 4 values from the MM genotype, and move them far to the left.

  35. Effect of extreme values on the mean

More Related