210 likes | 223 Views
Chapter 3 Section 2. Measures of Dispersion. Chapter 3 – Section 2. Comparing two sets of data. Comparing two sets of data The measures of central tendency (mean, median, mode) measure the differences between the “average” or “typical” values between two sets of data.
E N D
Chapter 3Section 2 Measures of Dispersion
Chapter 3 – Section 2 • Comparing two sets of data • Comparing two sets of data • The measures of central tendency (mean, median, mode) measure the differences between the “average” or “typical” values between two sets of data • Comparing two sets of data • The measures of central tendency (mean, median, mode) measure the differences between the “average” or “typical” values between two sets of data • The measures of dispersion in this section measure the differences between how far “spread out” the data values are
Chapter 3 – Section 2 • The range of a variable is the largest data value minus the smallest data value • The range of a variable is the largest data value minus the smallest data value • Compute the range of 6, 1, 2, 6, 11, 7, 3, 3 • The range of a variable is the largest data value minus the smallest data value • Compute the range of 6, 1, 2, 6, 11, 7, 3, 3 • The largest value is 11 • The smallest value is 1 • The range of a variable is the largest data value minus the smallest data value • Compute the range of 6, 1, 2, 6, 11, 7, 3, 3 • The largest value is 11 • The smallest value is 1 • Subtracting the two … 11 – 1 = 10 … the range is 10
Chapter 3 – Section 2 • The range only uses two values in the data set – the largest value and the smallest value • The range is not resistant • The range only uses two values in the data set – the largest value and the smallest value • The range is not resistant • If we made a mistake and 6, 1, 2 was recorded as 6000, 1, 2 • The range only uses two values in the data set – the largest value and the smallest value • The range is not resistant • If we made a mistake and 6, 1, 2 was recorded as 6000, 1, 2 • The range is now ( 6000 – 1 ) = 5999
Chapter 3 – Section 2 • The variance is based on the deviation from the mean • ( xi – μ ) for populations • ( xi – ) for samples • The variance is based on the deviation from the mean • ( xi – μ ) for populations • ( xi – ) for samples • To treat positive differences and negative differences, we square the deviations • ( xi – μ )2 for populations • ( xi – )2 for samples
Chapter 3 – Section 2 • The populationvariance of a variable is the sum of these squared deviations divided by the number in the population • The populationvariance of a variable is the sum of these squared deviations divided by the number in the population • The populationvariance of a variable is the sum of these squared deviations divided by the number in the population • The population variance is represented by σ2 • Note: For accuracy, use as many decimal places as allowed by your calculator
Chapter 3 – Section 2 • Compute the population variance of 6, 1, 2, 11 • Compute the population variance of 6, 1, 2, 11 • Compute the population mean first μ = (6 + 1 + 2 + 11) / 4 = 5 • Compute the population variance of 6, 1, 2, 11 • Compute the population mean first μ = (6 + 1 + 2 + 11) / 4 = 5 • Now compute the squared deviations (1–5)2 = 16, (2–5)2 = 9, (6–5)2 = 1, (11–5)2 = 36 • Compute the population variance of 6, 1, 2, 11 • Compute the population mean first μ = (6 + 1 + 2 + 11) / 4 = 5 • Now compute the squared deviations (1–5)2 = 16, (2–5)2 = 9, (6–5)2 = 1, (11–5)2 = 36 • Average the squared deviations (16 + 9 + 1 + 36) / 4 = 15.5 • The population variance σ2 is 15.5
Chapter 3 – Section 2 • The samplevariance of a variable is the sum of these squared deviations divided by one less than the number in the sample • The samplevariance of a variable is the sum of these squared deviations divided by one less than the number in the sample • The sample variance is represented by s2 • We say that this statistic has n – 1 degrees of freedom
Chapter 3 – Section 2 • Compute the sample variance of 6, 1, 2, 11 • Compute the sample variance of 6, 1, 2, 11 • Compute the sample mean first = (6 + 1 + 2 + 11) / 4 = 5 • Compute the sample variance of 6, 1, 2, 11 • Compute the sample mean first = (6 + 1 + 2 + 11) / 4 = 5 • Now compute the squared deviations (1–5)2 = 16, (2–5)2 = 9, (6–5)2 = 1, (11–5)2 = 36 • Compute the sample variance of 6, 1, 2, 11 • Compute the sample mean first = (6 + 1 + 2 + 11) / 4 = 5 • Now compute the squared deviations (1–5)2 = 16, (2–5)2 = 9, (6–5)2 = 1, (11–5)2 = 36 • Average the squared deviations (16 + 9 + 1 + 36) / 3 = 20.7 • The sample variance s2 is 20.7
Chapter 3 – Section 2 • Why are the population variance (15.5) and the sample variance (20.7) different for the same set of numbers? • Why are the population variance (15.5) and the sample variance (20.7) different for the same set of numbers? • In the first case, { 6, 1, 2, 11 } was the entire population (divide by N) • Why are the population variance (15.5) and the sample variance (20.7) different for the same set of numbers? • In the first case, { 6, 1, 2, 11 } was the entire population (divide by N) • In the second case, { 6, 1, 2, 11 } was just a sample from the population (divide by n – 1) • Why are the population variance (15.5) and the sample variance (20.7) different for the same set of numbers? • In the first case, { 6, 1, 2, 11 } was the entire population (divide by N) • In the second case, { 6, 1, 2, 11 } was just a sample from the population (divide by n – 1) • These are two different situations
Chapter 3 – Section 2 • Why do we use different formulas? • The reason is that using the sample mean is not quite as accurate as using the population mean • If we used “n” in the denominator for the sample variance calculation, we would get a “biased” result • Bias here means that we would tend to underestimate the true variance
Chapter 3 – Section 2 • The standarddeviation is the square root of the variance • The standarddeviation is the square root of the variance • The populationstandarddeviation • Is the square root of the population variance (σ2) • Is represented by σ • The standarddeviation is the square root of the variance • The populationstandarddeviation • Is the square root of the population variance (σ2) • Is represented by σ • The samplestandarddeviation • Is the square root of the sample variance (s2) • Is represented by s
Chapter 3 – Section 2 • If the population is { 6, 1, 2, 11 } • The population variance σ2 = 15.5 • The population standard deviation σ = • If the population is { 6, 1, 2, 11 } • The population variance σ2 = 15.5 • The population standard deviation σ = • If the sample is { 6, 1, 2, 11 } • The sample variance s2 = 20.7 • The sample standard deviation s = • If the population is { 6, 1, 2, 11 } • The population variance σ2 = 15.5 • The population standard deviation σ = • If the sample is { 6, 1, 2, 11 } • The sample variance s2 = 20.7 • The sample standard deviation s = • The population standard deviation and the sample standard deviation apply in different situations
Chapter 3 – Section 2 • The standard deviation is very useful for estimating probabilities
Chapter 3 – Section 2 • The empirical rule • If the distribution is roughly bell shaped, then • The empirical rule • If the distribution is roughly bell shaped, then • Approximately 68% of the data will lie within 1 standard deviation of the mean • The empirical rule • If the distribution is roughly bell shaped, then • Approximately 68% of the data will lie within 1 standard deviation of the mean • Approximately 95% of the data will lie within 2 standard deviations of the mean • The empirical rule • If the distribution is roughly bell shaped, then • Approximately 68% of the data will lie within 1 standard deviation of the mean • Approximately 95% of the data will lie within 2 standard deviations of the mean • Approximately 99.7% of the data (i.e. almost all) will lie within 3 standard deviations of the mean
Chapter 3 – Section 2 • For a variable with mean 17 and standard deviation 3.4 • For a variable with mean 17 and standard deviation 3.4 • Approximately 68% of the values will lie between(17 – 3.4) and (17 + 3.4), i.e. 13.6 and 20.4 • For a variable with mean 17 and standard deviation 3.4 • Approximately 68% of the values will lie between(17 – 3.4) and (17 + 3.4), i.e. 13.6 and 20.4 • Approximately 95% of the values will lie between(17 – 2 3.4) and (17 + 2 3.4), i.e. 10.2 and 23.8 • For a variable with mean 17 and standard deviation 3.4 • Approximately 68% of the values will lie between(17 – 3.4) and (17 + 3.4), i.e. 13.6 and 20.4 • Approximately 95% of the values will lie between(17 – 2 3.4) and (17 + 2 3.4), i.e. 10.2 and 23.8 • Approximately 99.7% of the values will lie between(17 – 3 3.4) and (17 + 3 3.4), i.e. 6.8 and 27.2 • For a variable with mean 17 and standard deviation 3.4 • Approximately 68% of the values will lie between(17 – 3.4) and (17 + 3.4), i.e. 13.6 and 20.4 • Approximately 95% of the values will lie between(17 – 2 3.4) and (17 + 2 3.4), i.e. 10.2 and 23.8 • Approximately 99.7% of the values will lie between(17 – 3 3.4) and (17 + 3 3.4), i.e. 6.8 and 27.2 • A value of 2.1 and a value of 33.2 would both be very unusual
Chapter 3 – Section 2 • Chebyshev’s inequality gives a lower bound on the percentage of observations that lie within k standard deviations of the mean (where k > 1) • Chebyshev’s inequality gives a lower bound on the percentage of observations that lie within k standard deviations of the mean (where k > 1) • This lower bound is • An estimated percentage • The actual percentage for any variable cannot be lower than this number • Chebyshev’s inequality gives a lower bound on the percentage of observations that lie within k standard deviations of the mean (where k > 1) • This lower bound is • An estimated percentage • The actual percentage for any variable cannot be lower than this number • Therefore the actual percentage must be this value or higher
Chapter 3 – Section 2 • Chebyshev’s inequality • For any data set, at least of the observations will lie within k standard deviations of the mean, where k is any number greater than 1
Chapter 3 – Section 2 • How much of the data lies within 1.5 standard deviations of the mean? • From Chebyshev’s inequality so that at least 55.6% of the data will lie within 1.5 standard deviations of the mean
Chapter 3 – Section 2 • If the mean is equal to 20 and the standard deviation is equal to 4, how much of the data lies between 14 and 26? • 14 to 26 are 1.5 standard deviations from 20 so that at least 55.6% of the data will lie between 14 and 26
Summary: Chapter 3 – Section 2 • Range • The maximum minus the minimum • Not a resistant measurement • Variance and standard deviation • Measures deviations from the mean • Not a resistant measurement • Empirical rule • About 68% of the data is within 1 standard deviation • About 95% of the data is within 2 standard deviations