1 / 22

Statistical Analysis of Sample Data - Describing Plant Height in Hypericum cumulicola

This text discusses the basic output of scientific investigations, including observations and data. It also explains how sample data can be used to estimate unknown population parameters, such as the population mean. The text then focuses on measures of location, including the arithmetic mean, median, and mode, and provides examples of how to calculate these measures. Finally, it introduces measures of variability, such as the sample variance and standard deviation, and explains how to interpret them. The text is based on chapter 3 of Gotelli & Ellison (2004) and chapter 4 of D. Heath (1995).

jbrophy
Download Presentation

Statistical Analysis of Sample Data - Describing Plant Height in Hypericum cumulicola

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Describing Samples Based on Chapter 3 of Gotelli & Ellison (2004) and Chapter 4 of D. Heath (1995). An Introduction to Experimental Design and Statistics for Biology. CRC Press.

  2. The basic output of any scientific investigation is a collection of observations or data. (Ex. If Y is a random variable, then we use Yi to denote the ith observation in our sample.) • Often, we will use our sample data to estimate unknown population parameters (Ex. We can use the sample mean, , to estimate the population mean, μ) • The construction of frequency distributions is usually the first step in summarizing data

  3. Hypericum cumulicola: • Small, short-lived perennial herb • Narrowly endemic and endangered • Flowers are small and bisexual

  4. Histogram of plant height (1995)

  5. Measures of location • It is useful to identify a “typical value” to summarize our observations (i.e., an “average”) • Examples include: • Mean • Median • Mode

  6. The Arithmetic Mean The arithmetic mean (or simply the mean) of a list of numbers is the sum of all the observations (Yi) in the list divided by the number of the observations (n):

  7. The Arithmetic Mean • Remember the formula for the expected value of a discrete random variable? • Since we assume, for our sample, that the Yi are the values of a random variable and that pi = 1/n for all Yi, we get:

  8. The Arithmetic Mean • The arithmetic mean of the observations in our sample ( ) is an unbiased estimator of the population mean (μ) if 3 conditions are met: • Observation are made on randomly selected individuals • Observations in the sample are independent • Observations are drawn from a larger population that is distributed as a normal random variable

  9. The Law of Large Numbers • As the sample size n increases, the arithmetic mean of Yi approaches the expected value of Y

  10. The Median • The value of a set of ordered observations that has an equal number of observations above and below it.

  11. The Median • Estimation: • For an odd number of observations, the median is the middle observation of the set. • Ex. Median of {1, 2, 3, 4, 5} = 3 • For an even number of observations, the median is the average of the two middle observations of the set. • Ex. Median of {1, 2, 3, 4, 5, 6} = (3+4)/2 = 3.5

  12. The sample mean and the median height of Hypericum cumulicola (ADULTS ONLY) The normal distribution with the observed sample mean and variance

  13. The Mode • The value of the observations that occurs most frequently in the sample. • This will be the peak of the frequency distribution in a histogram

  14. The distribution of height of Hypericum cumulicola is bimodal. Could you suggest why?

  15. Plotting seedlings and adults separately

  16. Final Comments on Measures of Location • When the underlying distribution is symmetrical (or nearly so), the mean, median, and mode are all similar in value, BUT… • …when there are extreme observations, the median or mode may better describe the location of the data

  17. Measures of variability • It is never sufficient to just state the mean or other measure of location of our data! • Because there is variability in nature, variability due to our sampling, etc., we also need to estimate the spread of our observations around the average value • Examples include: The range, the variance, and the standard deviation

  18. The sample variance An individual value is called a deviation from the mean. The sum of the squared deviations is called the sum of squares (SS). We divide SS by one less than the sample size to get the sample variance (s2), which is an unbiased estimator of the population variance (σ2).

  19. The sample standard deviation The units in which the variance is expressed are (original units)2, which is conceptually awkward. To get around this, the sample variance is converted to the sample standard deviation (s), by simple taking the square root:

  20. 68.26 % 15.87 % 15.87 % Mean + One standard deviation

  21. The Standard Error of the Mean • Remember the Central Limit Theorem: if the Yi are independent random observations and the sample size is “reasonably large”, the sample mean ( ) is approximately normally distributed with mean E[Y] and variance σ2(Y)/n • Thus, we can calculate the standard error of the mean as follows:

More Related