140 likes | 233 Views
Chapter 12, Part 2. STA 291 Summer I 2011. Mean and Standard Deviation. The five-number summary is not the most common way to describe a distribution numerically. The most common way is to use the mean to measure center and the standard deviation to measure spread. Mean.
E N D
Chapter 12, Part 2 STA 291 Summer I 2011
Mean and Standard Deviation • The five-number summary is not the most common way to describe a distribution numerically. • The most common way is to use the mean to measure center and the standard deviation to measure spread.
Mean • The mean of a set of observations is their average. • So to find the mean of n observations, add the values up and divide by n. • Symbolically, the mean is written as (pronounced “x bar”).
Example • The following are the number of jobs a sample of six graduating students applied for: 17 15 23 7 9 13 Find the mean.
Standard Deviation • If you are using the mean to describe the center of a distribution, you should use the standard deviation to describe spread. • The standard deviation measures the average distance of the observations from the mean. • Symbolically, standard deviation is denoted by s.
Standard Deviation (cont.) • In order to find the standard deviation of n observations: • Find the distance of each observation from the mean, and square each of these distances. • Average the distances by dividing their sum by . This average squared distance is called the variance. • The standard deviation is the square root of the variance.
Standard Deviation Properties • Standard deviation measures spread about the mean. • We use s to describe spread when we use to describe center. • s cannot be negative. • only when there is no spread. • As the observations become more spread out about the mean, s increases.
Relating the Two Methods • What method is preferable? • five number summary or • mean and standard deviation • The preferable method is largely determined by shape. • If we have a symmetric distribution, reporting the mean and standard deviation is preferable. • If we have a skewed distribution, reporting the five number summary is preferable.
Relating the Two Methods (cont.) Consider the following data sets: • Data Set 1: 10 64 71 73 77 85 89 92 • Data Set 2: 60 64 71 73 77 85 89 92 • mean of Data Set 1 = 70.125 • mean of Data Set 2 = 76.375
Relating the Two Methods (cont.) • If a distribution is symmetric, the mean and median will be close to each other. • If you have a symmetric distribution, it is preferable to use the mean, since the mean uses all the data points. • If the distribution is skewed, the mean is pulled toward the long tail. In this situation, the median is a better measure of center, since it is less affected by skewness.
Relating the Two Methods (cont.) • In the computation of the mean, all of the values are weighted equally. • As a result, the mean is strongly influenced by extreme observations. • On the other hand, the median is simply the middle value, so extreme observations have a substantially lesser effect on it. • Thus, if a data set has outliers, the median is often a better measure of the center of the distribution.
Mean, Median, and Shape • If the mean and median are close together, the distribution is roughly symmetric. • If the mean is significantly larger than the median, the distribution is right-skewed. • If the mean is significantly smaller than the median, the distribution is left-skewed.
Examples • Suppose we are told that the mean score for an exam was 79.2 and the median score was 83.5. What is the shape of the distribution of exam scores? • Now, suppose the mean score was 81.2 and the median score was 81.1. What is the shape of the distribution of exam scores?