1 / 101

Descriptive Statistics: Presenting and Describing Data

Descriptive Statistics: Presenting and Describing Data. Frequency Distribution. A table or graph describing the number of observations in each category or class of a data set.

Download Presentation

Descriptive Statistics: Presenting and Describing Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Descriptive Statistics: Presenting and Describing Data

  2. Frequency Distribution • A table or graph describing the number of observations in each category or class of a data set.

  3. Example: Consider the number of bottles of soda sold in a snack bar during lunch hour, on 40 days.(The numbers have been arranged in increasing order.) • 63 71 76 81 85 • 66 73 76 82 85 • 67 73 76 82 86 • 68 74 77 84 86 • 68 74 78 84 89 • 70 75 79 84 90 • 71 75 79 85 92 • 71 75 79 85 94

  4. In order to get a better grasp of this distribution of numbers, we’ll organize them into categories or classes. • We’ll look at • absolute frequency, • relative frequency, • cumulative absolute frequency, & • cumulative relative frequency

  5. Notation • [20, 30] denotes all real numbers between 20 & 30, including the 20 & the 30. • (20, 30) denotes all real numbers between 20 & 30, including neither the 20 nor the 30. • [20, 30) denotes all real numbers between 20 & 30, including the 20 but not the 30. • (20, 30] denotes all real numbers between 20 & 30, including the 30 but not the 20. • So the square bracket means include that endpoint & the round parenthesis means do not include that endpoint.

  6. Absolute Frequency • classabs. freq. • [60, 65) 1 • [65, 70) 4 • [70, 75) 8 • [75, 80) 11 • [80, 85) 6 • [85, 90) 7 • [90, 95) 3 • 40

  7. Histogram or Bar Graph of Absolute Frequency 12 10 8 6 4 2 0 Absolute frequency 60 65 70 75 80 85 90 95 Bottles of soda

  8. Relative Frequency • classabs. freq.rel. freq. • [60, 65) 1 0.025 • [65, 70) 4 0.100 • [70, 75) 8 0.200 • [75, 80) 11 0.275 • [80, 85) 6 0.150 • [85, 90) 7 0.175 • [90, 95) 30.075 • 40 1.000

  9. Relative Frequency This graph looks the same as the last one, except the numbers on the vertical axis are percentages (in decimal form) instead of integers. 0.300 0.250 0.200 0.150 0.100 0.050 0.000 Relative frequency 60 65 70 75 80 85 90 95 Bottles of soda

  10. Frequency Polygonline connecting middle points of tops of bars 12 10 8 6 4 2 0 Absolute frequency 60 65 70 75 80 85 90 95 Bottles of soda

  11. Cumulative Absolute Frequency • classabs. freq.rel. freq.cum. abs. freq. • [60, 65) 1 0.025 1 • [65, 70) 4 0.100 5 • [70, 75) 8 0.200 13 • [75, 80) 11 0.275 24 • [80, 85) 6 0.150 30 • [85, 90) 7 0.175 37 • [90, 95) 30.075 40 • 40 1.000

  12. Cumulative Absolute Frequency 40 35 30 25 20 15 10 5 0 Notice that the graph of the cumulative absolute frequency looks like a set of stairs going up from left to right. Cumulative Absolute Frequency 60 65 70 75 80 85 90 95 Bottles of soda

  13. Cumulative Relative Frequency • classabs. freq.rel. freq.cum. abs. freq.cum. rel. freq. • [60, 65) 1 0.025 1 0.025 • [65, 70) 4 0.100 5 0.125 • [70, 75) 8 0.200 13 0.325 • [75, 80) 11 0.275 24 0.600 • [80, 85) 6 0.150 30 0.750 • [85, 90) 7 0.175 37 0.925 • [90, 95) 30.075 40 1.000 • 40 1.000

  14. Cumulative Relative Frequency 1.00 0.75 0.50 0.25 0.00 Again we have our stairs, but the numbers on the vertical axis are percentages (in decimal form), and the height of the last bar is always 1 (or 100%). Cumulative Relative Frequency 60 65 70 75 80 85 90 95 Bottles of soda

  15. Cumulative Relative Frequency Ogive 1.00 0.75 0.50 0.25 0.00 Line connecting the points at the back of the steps. Cumulative Relative Frequency 60 65 70 75 80 85 90 95 Bottles of soda

  16. Next we will consider two types of summary measures: • 1. Measures of the center of the distribution. • 2. Measures of the spread of the distribution.

  17. Measures of the center of the distribution, or typical value, or average.

  18. Measures of the Center of the Distribution • Mean or Arithmetic Mean: add up the values of the observations; then divide by the number of observations. • Median: the value for which half of the observations are above that value & half are below it. • Mode: Most common, most frequent, or most probable value.

  19. Example 1 • Observations 2, 2, 3, 4, 8, 10, 13 • Mean 6 • Median 4 • Mode 2

  20. Example 2 • Observations -5, 8, 8, 9, 10, 12 • Mean 7 • Median 8.5 • Mode 8

  21. Example 3 • Observations 2, 3, 4, 4, 4, 7 • Mean 4 • Median 4 • Mode 4

  22. Example 4 • Observations 11, 9, 26, 11, 10, 11 • To calculate the median, we will want to have the observations in order: • 9 , 10, 11, 11, 11, 26 • Mean 13 • Median 11 • Mode 11

  23. Computing the Meanfor a Frequency Distribution of a Population • Salary xiFreq. fi • 700 8 • 800 23 • 900 75 • 1000 90 • 1100 43 • 1200 11 250 We will denote the number of observations in our population as N. In this example, it’s 250.

  24. Computing the Meanfor a Frequency Distribution of a Population • Salary xiFreq. fi • 700 8 • 800 23 • 900 75 • 1000 90 • 1100 43 • 1200 11 250 First we need the sum of all the observations: (700 + 700 + 700 + … + 700) + (800 + 800 + 800 + … + 800) + … + (1200 + 1200 + 1200 + … + 1200)

  25. Computing the Meanfor a Frequency Distribution of a Population • Salary xiFreq. fi • 700 8 • 800 23 • 900 75 • 1000 90 • 1100 43 • 1200 11 250 First we need the sum of all the observations: (700 + 700 + 700 + … + 700) + (800 + 800 + 800 + … + 800) + … + (1200 + 1200 + 1200 + … + 1200) = (700 • 8) + (800 • 23) + (900 • 75) + (1000 • 90) + (1100 • 43) + (1200 • 11)

  26. Computing the Meanfor a Frequency Distribution of a Population • Salary xiFreq. fi xi fi • 700 8 5600 • 800 23 18,400 • 900 75 67,500 • 1000 90 90,000 • 1100 43 47,300 • 1200 11 13,200 • 250 242,000

  27. Computing the Meanfor a Frequency Distribution of a Population • Salary xiFreq. fi xi fi • 700 8 5600 • 800 23 18,400 • 900 75 67,500 • 1000 90 90,000 • 1100 43 47,300 • 1200 11 13,200 • 250 242,000 Then to get the mean, we will divide that sum by the number of observations.

  28. Computing the Meanfor a Frequency Distribution of a Population • Salary xiFreq. fi xi fi • 700 8 5600 • 800 23 18,400 • 900 75 67,500 • 1000 90 90,000 • 1100 43 47,300 • 1200 11 13,200 • 250 242,000 • So the mean equals 242,000 / 250 = 968.0.

  29. Notation m • We denote the mean of a population by the Greek letter mu: For a simple list of numbers, we computed  as: If c is the number of categories or classes in our frequency distribution, then we computed  for a frequency distribution as:

  30. What is the mode of this frequency distribution? • Salary xiFreq. fi • 700 8 • 800 23 • 900 75 • 1000 90 • 1100 43 • 1200 11 250 The mode is the most frequent or most common value, which in this example is 1000.

  31. What is the median of this frequency distribution? • Salary xiFreq. fi • 700 8 • 800 23 • 900 75 • 1000 90 • 1100 43 • 1200 11 250 Remember, the median is the middle value, or the average of the two middle values, when there is an even number of observations, as there is here.

  32. Where is the median? • The middle is between the salaries in the 125th and 126th positions, where there are 125 values below and 125 above. • So we need to determine what salaries are in the 125th and 126th positions. Salary value: x x x … x x x x … x x x Position:1 2 3 … 124 125 126 127 … 248 249 250

  33. What is the median of this frequency distribution? • Salary xiFreq. fi • 700 8 • 800 23 • 900 75 • 1000 90 • 1100 43 • 1200 11 250 In the $700 category, we have observations 1 through 8. In the $800 category, we have observations 9 through 31 (= 8+23). In the $900 category, we have observations 32 through 106 (= 8+23+75) . In the $1000 category, we have observations 107 through 196 (= 8+23+75+90) . So the 125th & 126th observations are in the $1000 category. Averaging the values of the two middle observations together, we get (1000+1000)/2 = 1000. So our median is 1000.

  34. Calculating mean & median for interval data.Suppose we have the following population data.

  35. We will compute the mean first. We have 35 observations.

  36. We need a representative element from each interval. For that we’ll use the midpoint.

  37. Now we continue as we did before to calculate the mean for a frequency distribution.

  38. Add up.

  39. Divide by the number of observations, and we have the mean.

  40. Now let’s calculate the median.

  41. To calculate the median of interval data, we need to make an assumption. • We know the number of observations in each interval, but not exactly what they are. • We’re going to assume that the observations are evenly distributed in the intervals.

  42. First, we need to figure out in which category the median is. There are 35 observations, so the middle one is the 18th one. (There are 17 observations below the 18th and 17 above it.) The first 10 observations are in the first category. The 11th to the 20th observations are in the second category. So the median must be in the second category.

  43. The formula for calculating the median for interval data looks quite different from what we did before. Lmd is the lower limit on the category containing the median. N is the population size. Sfp is the sum of the frequencies of the categories preceding the category containing the median. fmd is the frequency of the category containing the median. width is the width of the interval containing the median.

  44. Lmd is the lower limit on the category containing the median. 15 N is the population size. 35 Sfp is the sum of the frequencies of the categories preceding the category containing the median. 10 fmd is the frequency of the category containing the median. 10 width is the width of the interval containing the median. 15 Let’s go through the parts of the formula, keeping in mind that the median is in the second category.

  45. Now we just assemble the pieces.

  46. What does this mean? Remember that the median is the 18 observation. That means it’s the 8th observation of 10 in the second category. So it is closer to the end of that interval than the beginning. What the formula is telling us is that the median is 0.75 or ¾ of the way through the distance of 15 units, in the interval starting at 15.

  47. Measures of dispersion or the spread of the distribution

  48. Measures of Dispersion • Range • Mean Absolute Deviation (MAD) • Mean Squared Deviation (MSD) • Coefficient of Variation (CV) • As we shall see, the first three are measures of absolute dispersion, while the CV is a measure of relative dispersion.

  49. range • largest value minus smallest value

  50. Example 1 • Observations: 1 2 2 2 3 4 4 5 6 • The range is 6 -1 = 5

More Related