1 / 38

Section 6A Characterizing a Data Distribution

Section 6A Characterizing a Data Distribution. Pages 380-388. Definition -The distribution of a variable (or data set) describes the values taken on by the variable and the frequency (or relative frequency) of these values. Example: Lengths of words in the Gettysburg Address.

twila
Download Presentation

Section 6A Characterizing a Data Distribution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Section 6ACharacterizing a Data Distribution Pages 380-388

  2. Definition -The distribution of a variable (or data set) describes the values taken on by the variable and the frequency (or relative frequency) of these values. Example: Lengths of words in the Gettysburg Address

  3. How do we characterize a data distribution? Center =Average - Mean- Median- Mode - Effect of an Outlier - Confusion Shape of a Distribution - Number of Peaks- Symmetry or Skewness- Variation more in section 6B

  4. 6-A Averages The word “Average” actually has several meanings. Generally – average = center of a distribution or typical representative

  5. 6-A Measures of Center in a Distribution • Themeanis what we most commonly call the average value. It is defined as follows: • Themedianis the middle value in the sorted data set (or halfway between the two middle values if the number of values is even). • Themodeis the most common value (or group of values) in a distribution.

  6. 6-A Mean Distance

  7. 6-A Mean distance 36 + 67 + 93 + 142+ 484 + 887 + 1,765 + 2,791 + 3,654 = 9,922 Mean distance= 9,922/ 9 = 1,102.4million miles

  8. 6-A Median Distance • Themedianis the middle value in the sorted data set (or halfway between the two middle values if the number of values is even).

  9. 6-A Median Distance 4 below 4 above

  10. 6-A Steps for Finding the Median • Sort the data (put it in order) !!!!!! • Count the data (n pieces). Decide if n is odd or even. • If n is odd – the median will be in position (n+1)/2. • If n is even – the median will be located halfway between the numbers in positions n/2 and (n+1)/2.

  11. 6-A Median Distance – ‘Real’ Planet List

  12. 6-A Median Distance Median is halfway between 142 and 484, so median = (142+484)/2 = 313

  13. 6-A Comment about the Median • The median splits the data into two equal-sized pieces. • Half the data (50%) will be below the median. • Half the data (50%) will be above the median.

  14. 6-A Mode Distance • Themodeis the most common value (or group of values) in a distribution.

  15. 6-A Mode Examples a. 5 5 5 3 1 5 1 4 3 5 b. 1 2 2 2 3 4 5 6 6 6 7 9 c. 1 2 2 6 6 8 9 10 8

  16. 6-A Mode Examples • Mode is 5 a. 5 5 5 3 1 5 1 4 3 5 b. 1 2 2 2 3 4 5 6 6 6 7 9 c. 1 2 2 6 6 8 9 10 8

  17. 6-A Mode Examples • Mode is 5 • Bimodal a. 5 5 5 3 1 5 1 4 3 5 b. 1 2 2 2 3 4 5 6 6 6 7 9 c. 1 2 2 6 6 8 9 10 8

  18. 6-A Mode Examples • Mode is 5 • Bimodal • Trimodal a. 5 5 5 3 1 5 1 4 3 5 b. 1 2 2 2 3 4 5 6 6 6 7 9 c. 1 2 26 68 9 10 8

  19. 6-A The Mode • You may not have one! • Could have multiple modes! • The mode is easy to spot in a graph – it occurs at the peak. • The mode is the only measure of “center” available for categorical data – e.g. gender

  20. How do we characterize a data distribution? Average - Mean- Median- Mode - Effect of an Outlier - Confusion Shape of a Distribution - Number of Peaks- Symmetry or Skewness- Variation

  21. Outliers • An outlier is an observation that is much higher (or much lower) than all the other values in your list. • i.e. – an extremely unusual observation. • Note – every not every set of data has outliers. The minimum and maximum values are not necessarily outliers!!!

  22. The Effect of an Outlier Definition: An outlier is a data value that is much higher or much lower than almost all other values. Five graduating seniors on a college basketball team receive the following first-year contract offers to play in the National Basketball Association: $0, $0, $0, $0, $3,500,000 median: 0, 0, 0, 0, $3,500,000 median: $0 mode: 0, 0, 0, 0, $3,500,000 mode: $0 Including an outlier can pull the mean significantly upward or downward.Including an outlier does not significantly affect the median.Including an outlier does not affect the mode.

  23. The Effect of an Outlier A track coach wants to determine an appropriate heart rate for her athletes during their workouts. In the middle of the workout, she reads the following heart rates (beats/min) from five athletes: 130, 135, 140, 145, 325. median: 130, 135, 140, 145, 325 median: 140 bpm mode: none _____________________________________________Cleary 325 is an outlier. Clearly 325 is a mistake (faulty heart monitor?) Throw out the outlier? median: 137.5 bpm median: 130, 135, 140, 145 mode: none

  24. How do we characterize a data distribution? Average - Mean- Median- Mode- Effect of an Outlier- Confusion Shape of a Distribution - Number of Peaks- Symmetry or Skewness- Variation

  25. 6-A Mean vs. Median A news article reports that of the 411 players on the NBA roster in February, 1988, only 139 “made more than the league average salary of $2.36 million.” Recall that the word “average” can have several interpretations. In this case, is $2.36 million the mean or the median salary for 1988 NBA players?Explain.

  26. Confusion about “Average” A newspaper surveys wages for assembly workers and reports an average of $22 per hour. The workers at one large firm immediately request a pay raise, claiming that they work as hard as other companies but their average wage is only $19. The management rejects their request, telling them that they are overpaid because their average wage, in fact is $23 per hour. Can they both be right? salaries: $19, $19, $19, $19, 39 median: $19

  27. Confusion about “Average” A newspaper survey wages for assembly workers and reports an average of $22 per hour. The workers at one large firm immediately request a pay raise, claiming that they work as hard as other companies but their average wage is only $19. The management rejects their request, telling them that they are overpaid because their average wage, in fact is $23 per hour. Can they both be right? salaries: $6, $20, $23, $23, $23 median: $23

  28. Confusion about “Average” All 100 first-year students at a small college take three courses in the Core Studies Program. The first two courses are taught in large lectures, with all 100 students in a single class. The third course is taught in ten classes of 10 students each. The students claim that the mean size of their Core Studies classes is 70. The administrators claim that the mean class size is only 25 students. Explain. Students say my average class size is: mean class size per student Administrators say the average Core Studies class size is: mean number of students per class

  29. How do we characterize a data distribution? Average - Mean- Median- Mode- Effect of an Outlier- Confusion Shape of a Distribution - Number of Peaks- Symmetry or Skewness- Variation

  30. 6-A Describing a distribution

  31. Mode = Mean = Median SYMMETRIC Shape of a DistributionSymmetry and Skewness A distribution is symmetric if its left half is a mirror image of its right half.

  32. Mean Mode Median SKEWED LEFT (negatively) Shape of a DistributionSymmetry and Skewness A distribution is left-skewed if its ‘tail’ is on the left.

  33. Mean Mode Median SKEWED RIGHT (positively) Shape of a DistributionSymmetry and Skewness A distribution is right-skewed if its ‘tail’ is on the right.

  34. Mode = Mean = Median SYMMETRIC Mean Mode Mean Mode Median Median SKEWED LEFT (negatively) SKEWED RIGHT (positively) 6-A Symmetric and Skewed Distributions Use Mean to describe center Use Median to describe center

  35. Shape of a DistributionSymmetry and Skewness Do you expect the distribution of heights of 100 women to be symmetric, left-skewed, or right-skewed? Explain. Do you expect the distribution of speeds of cars on a road where a visible patrol car is using radar to be symmetric, left-skewed, or right skewed. Explain.

  36. Low variation Moderate variation High variation 6-A Variation = horizontal spread How would you expect the variation to differ between times in the Olympic marathon and times in the New York Marathon? Explain.

  37. 6-A Describing a distribution • Shape • Number of peaks, symmetry/skewness • Outliers? • Center • Use mean if the data is symmetric • Use median is there is a strong skew or are outliers • Spread • Horizontal spread – Is the data tightly clustered around the center? (low or high variation?)

  38. 6-A Homework Pages 388-390 # 10, 14, 18, 21, 22, 27, 28, 30, 35, 38* * It is not necessary to draw the sketch for this one.

More Related