12.3 – Measures of Dispersion

12.3 – Measures of Dispersion Dispersion is another analytical method to study data. A main use of dispersion is to compare the amounts of spread in two (or more) data sets. A common technique in inferential statistics is to draw comparisons between populations by analyzing samples that come from those populations. Two of the most common measures of dispersion are the range and the standard deviation. Range For any set of data, the range of the set is given by the following formula: Range = (greatest value in set) – (least value in set).

12.3 – Measures of Dispersion Example: Range The two sets below have the same mean and median (7). Find the range of each set. Range of Set A: 13 – 1 = 12 Range of Set A: 9 – 5 = 4

12.3 – Measures of Dispersion Standard Deviation One of the most useful measures of dispersion is the standard deviation. It is based on deviations from the mean of the data. Find the deviations from the mean for all data values of the sample 1, 2, 8, 11, 13. The mean is 7. To find each deviation, subtract the mean from each data value. – 5 1 – 6 4 6 The sum of the deviations is always equal to zero.

12.3 – Measures of Dispersion Standard Deviation Calculating the Sample Standard Deviation The sample standard deviation is found by calculating the square root of the variance. The variance is found by summing the squares of the deviations and dividing that sum by n – 1 (since it is a sample instead of a population). The sample standard deviation is denoted by the letter s. The standard deviation of a population is denoted by .

12.3 – Measures of Dispersion Standard Deviation Calculating the Sample Standard Deviation 1. Calculate the mean of the numbers. 2. Find the deviations from the mean. 3. Square each deviation. 4. Sum the squared deviations. 5. Divide the sum in Step 4 by n – 1. 6. Take the square root of the quotient in Step 5.

12.3 – Measures of Dispersion Standard Deviation Calculating the Sample Standard Deviation Example: Find the standard deviation of the sample set {1, 2, 8, 11, 13}. = 7 – 5 1 4 6 – 6 (Deviation)2 25 1 36 36 16 Sum of the (Deviations)2 = 36 + 25 + 1 + 16 + 36 = 114

12.3 – Measures of Dispersion Standard Deviation Calculating the Sample Standard Deviation Sum of the (Deviations)2 = 36 + 25 + 1 + 16 + 36 = 114 Divide 114 by n – 1 with n = 5: 114 28.5 = 5 – 1 Take the square root of 28.5: 5.34 The sample standard deviation of the data is 5.34.

12.3 – Measures of Dispersion Example: Interpreting Measures Standard Deviation Two companies, A and B, sell small packs of sugar for coffee. The mean and standard deviation for samples from each company are given below. Which company consistently provides more sugar in their packs? Which company fills its packs more consistently?

12.3 – Measures of Dispersion Example: Interpreting Measures Standard Deviation Which company consistently provides more sugar in their packs? The sample mean for Company A is greater than the sample mean of Company B. The inference can be made that Company A provides more sugar in their packs.

12.3 – Measures of Dispersion Example: Interpreting Measures Standard Deviation Which company fills its packs more consistently? The standard deviation for Company B is less than the standard deviation for Company A. The inference can be made that Company B fills their packs more closer to their mean than Company A.

12.3 – Measures of Dispersion Chebyshev’s Theorem For any set of numbers, regardless of how they are distributed, the fraction of them that lie within k standard deviations of their mean (where k > 1) is at least What is the minimum percentage of the items in a data set which lie within 2, and 3 standard deviations of the mean? 75% 88.9%

12.3 – Measures of Dispersion Coefficient of Variation The coefficient of variation expresses the standard deviation as a percentage of the mean. It is not strictly a measure of dispersion as it combines central tendency and dispersion. For any set of data, the coefficient of variation is given by for a sample or for a population.

12.3 – Measures of Dispersion Example: Comparing Samples Coefficient of Variation Compare the dispersions in the two samples A and B. A: 12, 13, 16, 18, 18, 20 B: 125, 131, 144, 158, 168, 193 Sample B has a larger dispersion than sample A, but sample A has the larger relative dispersion (coefficient of variation).

12.4 – Measures of Position In some cases, the analysis of certain individual items in the data set is of more interest rather than the entire set. It is necessary at times, to be able to measure how an item fits into the data, how it compares to other items of the data, or even how it compares to another item in another data set. Measures of position are several common ways of creating such comparisons.

12.4 – Measures of Position The z-Score The z-score measures how many standard deviations a single data item is from the mean.

12.4 – Measures of Position Example: Comparing with z-Scores Two students, who take different history classes, had exams on the same day. Jen’s score was 83 while Joy’s score was 78. Which student did relatively better, given the class data shown below?

12.4 – Measures of Position Example: Comparing with z-Scores Joy’s z-score: Jen’s z-score: 78 – 70 83 – 78 = 1.6 = 1.25 5 4 Joy’s z-score is higher as she was positioned relatively higher within her class than Jen was within her class.

12.4 – Measures of Position Percentiles A percentile measure the position of a single data item based on the percentage of data items below that single data item. Standardized tests taken by larger numbers of students, convert raw scores to a percentile score. If approximately n percent of the items in a distribution are less than the number x, then x is the nth percentile of the distribution, denoted Pn.

12.4 – Measures of Position Example: Percentiles The following are test scores (out of 100) for a particular math class. 44 56 58 62 64 64 70 72 72 72 74 74 75 78 78 79 80 82 82 84 86 87 88 90 92 95 96 96 98 100 Find the fortieth percentile. 40% = 0.4 The average of the 12th and 13th items represents the 40th percentile (P40). 0.4(30) 12 40% of the scores were below 74.5.

12.4 – Measures of Position Other Percentiles: Deciles and Quartiles Deciles are the nine values (denoted D1, D2,…, D9) along the scale that divide a data set into ten (approximately) equal parts. 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90% Quartiles are the three values (Q1, Q2, Q3) that divide the data set into four (approximately) equal parts. 25%, 50%, and 75%

12.4 – Measures of Position Example: Deciles Other Percentiles: Deciles and Quartiles The following are test scores (out of 100) for a particular math class. 44 56 58 62 64 64 70 72 72 72 74 74 75 78 78 79 80 82 82 84 86 87 88 90 92 95 96 96 98 100 Find the sixth decile. Sixth decile = 60% The average of the 18th and 19th items represents the 6th decile (D6). 60% = 0.6 0.6(30) 60% of the scores were at or below 82. 18

12.4 – Measures of Position Quartiles Other Percentiles: Deciles and Quartiles For any set of data (ranked in order from least to greatest): The second quartile, Q2 (50%) is the median. The first quartile, Q1 (25%) is the median of items below Q2. The third quartile, Q3 (75%) is the median of items above Q2.

12.4 – Measures of Position Example: Quartiles Other Percentiles: Deciles and Quartiles The following are test scores (out of 100) for a particular math class. 44 56 58 62 64 64 70 72 72 72 74 74 75 78 78 79 80 82 82 84 86 87 88 90 92 95 96 96 98 100 Find the three quartiles. Q1= 25% The 8th item represents the 1st quartile (Q1) 25% = 0.25 0.25(30) 25% of the scores were below 72. 7.5

12.4 – Measures of Position Example: Quartiles Other Percentiles: Deciles and Quartiles The following are test scores (out of 100) for a particular math class. 44 56 58 62 64 64 70 72 72 72 74 74 75 78 78 79 80 82 82 84 86 87 88 90 92 95 96 96 98 100 Find the three quartiles. Q2= 50% = median The average of the 15th and 16th items represents the 2nd quartile (Q2) or the median 50% = 0.5 0.5(30) 50% of the scores were below 78.5. 15

12.4 – Measures of Position Example: Quartiles Other Percentiles: Deciles and Quartiles The following are test scores (out of 100) for a particular math class. 44 56 58 62 64 64 70 72 72 72 74 74 75 78 78 79 80 82 82 84 86 87 88 90 92 95 96 96 98 100 Find the three quartiles. Q3= 75% The 23rd item represents the 3rd quartile (Q3) 75% = 0.75 0.75(30) 75% of the scores were below 88. 22.5

12.4 – Measures of Position Box Plots A box plot or a box and whisker plot is a visual display of five statistical measures. The five statistical measures are: the lowest value, the first quartile, the median, the third quartile, the largest value. the lowest value the largest value

12.4 – Measures of Position Box Plots Example: The following are test scores (out of 100) for a particular math class. 44 56 58 62 64 64 70 72 72 72 74 74 75 78 78 79 80 82 82 84 86 87 88 90 92 95 96 96 98 100 Q1= 25% = 72 Q2= 50% = median= 78.5 Q3= 75%= 88 Lowest = 44 Largest = 100

12.5 – The Normal Distribution Discrete and Continuous Random Variables Discrete random variable: A random variable that can take on only certain fixed values. The number of even values of a single die. The number of heads in three tosses of a fair coin. Continuous random variable: A variable whose values are not restricted. The diameter of a growing tree. The height of third graders.

12.5 – The Normal Distribution Definition and Properties of a Normal Curve A normal curve is a symmetric, bell-shaped curve. Any random continuous variable whose graph has this characteristic shape is said to have a normal distribution. On a normal curve the horizontal axis is labeled with the mean and the specific data values of the standard deviations. If the horizontal axis is labeled using the number of standard deviations from the mean, rather than the specific data values, then the curve the standard normal curve

12.5 – The Normal Distribution Sample Statistics Normal Curve Standard Normal Curve 1 2 1.4 – 1 2.8 – 2 – 2.8 – 1.4 0 or 5.5 5.5

12.5 – The Normal Distribution Normal Curves B S A C 0 S is standard, with mean = 0, standard deviation = 1 A has mean < 0, standard deviation = 1 B has mean = 0, standard deviation < 1 C has mean > 0, standard deviation > 1

12.5 – The Normal Distribution Properties of Normal Curves The graph of a normal curve is bell-shaped and symmetric about a vertical line through its center. The mean, median, and mode of a normal curve are all equal and occur at the center of the distribution. Empirical Rule: the approximate percentage of all data lying within 1, 2, and 3 standard deviations of the mean. within 1 standard deviation 68% within 2 standard deviations 95% within 3 standard deviations. 99.7%

12.5 – The Normal Distribution Empirical Rule 68% 95% 99.7%

12.5 – The Normal Distribution Example: Applying the Empirical Rule A sociology class of 280 students takes an exam. The distribution of their scores can be treated as normal. Find the number of scores falling within 2 standard deviations of the mean. A total of 95% of all scores lie within 2 standard deviations of the mean. (.95)(280) = 266 scores

12.5 – The Normal Distribution Normal Curve Areas In a normal curve and a standard normal curve, the total area under the curve is equal to 1. The area under the curve is presented as one of the following: • Percentage (of total items that lie in an interval), • Probability (of a randomly chosen item lying in an interval), • Area (under the normal curve along an interval).

12.5 – The Normal Distribution A Table of Standard Normal Curve Areas To answer questions that involve regions other than 1, 2, or 3 standard deviations, a Table of Standard Normal Curve Areas is necessary. The table shows the area under the curve for all values in a normal distribution that lie between the mean and z standard deviations from the mean. The percentage of values within a certain range of z-scores, or the probability of a value occurring within that range are the more common uses of the table. Because of the symmetry of the normal curve, the table can be used for values above the mean or below the mean.

12.5 – The Normal Distribution Example: Applying the Normal Curve Table Use the table to find the percent of all scores that lie between the mean and 1.5 standard deviations above the mean. z = 1.5 Find 1.50 in the z column. The table entry is .4332 z = 1.50 Therefore, 43.32% of all values lie between the mean and 1.5 standard deviations above the mean. or There is a .4332 probability that a randomly selected value will lie between the mean and 1.5 standard deviations above the mean.

12.5 – The Normal Distribution Example: Applying the Normal Curve Table Use the table to find the percent of all scores that lie between the mean and 2.62 standard deviations below the mean. z = –2.62 The table entry is 0.4956 z = – 2.62 Find 2.62 in the z column. Therefore, 49.56% of all values lie between the mean and 2.62 standard deviations below the mean. or There is a 0.4956 probability that a randomly selected value will lie between the mean and 2.62 standard deviations below the mean.

12.5 – The Normal Distribution Example: Applying the Normal Curve Table Find the percent of all scores that lie between the given z-scores. z = –1.7 z = 2.55 z = – 1.7 The table entry is 0.4554 z = 2.55 The table entry is 0.4946 0.4554 + 0.4946 = 0.95 Therefore, 95% of all values lie between – 1.7 and 2.55 standard deviations.

12.5 – The Normal Distribution Example: Applying the Normal Curve Table Find the probability that a randomly selected value will lie between the given z-scores. z = 0.61 z = 2.63 z = 0.61 The table entry is 0.2291 z = 2.63 The table entry is 0.4957 0.4957 – 0.2291 = 0.2666 There is a 0.2666 probability that a randomly selected value will lie between 0.61 and 2.63 standard deviations.

12.5 – The Normal Distribution Example: Applying the Normal Curve Table Find the probability that a randomly selected value will lie above the given z-score. z = 2.14 z = 2.14 The table entry is 0.4838 Half of the area under the curve is 0.5000 0.5000 – 0.4838 = 0.0162 There is a 0.0162 probability that a randomly selected value will lie 2.14 standard deviations.

12.5 – The Normal Distribution Example: Applying the Normal Curve Table The volumes of soda in bottles from a small company are distributed normally with a mean of 12 ounces and a standard deviation .15 ounces. If 1 bottle is randomly selected, what is the probability that it will have more than 12.33 ounces? z = 2.2 The table entry is 0.4861 12.33 Half of the area under the curve is 0.5000 0.5000 – 0.4861 = 0.0139 There is a 0.0139 probability that a randomly selected bottle will contain more than 12.33 ounces.

12.5 – The Normal Distribution Example: Finding z-scores for Given Areas Assuming a normal distribution, find the z-score meeting the condition that 39% of the area is to the right of z. 50% of the area lies to the right of the mean. = 0.11 11% The areas from the Normal Curve Table are based on the area between the mean and the z-score. = 0.39 39% area between the mean and the z-score = 0.50 – 0.39 = 0.11 From the table, find the area of 0.1100 or the closest value and read the z-score. z-score = 0.28

12.5 – The Normal Distribution Example: Finding z-scores for Given Areas Assuming a normal distribution, find the z-score meeting the condition that 76% of the area is to the left of z. 50% of the area lies to the left of the mean. 26% = 0.26 50% The areas from the Normal Curve Table are based on the area between the mean and the z-score. 0.5000 area between the mean and the z-score = 0.76 – 0.50 = 0.26 From the table, find the area of 0.2600 or the closest value and read the z-score. z-score = 0.71

12.3 – Measures of Dispersion