1 / 45

Overview of Today’s lecture

Overview of Today’s lecture. Today’s topic: Describing distributions with numbers Introduction Measuring center Measuring spread. Describing distributions with numbers:. frequency. We saw several graphical ways to describe data For example – a histogram. frequency.

Download Presentation

Overview of Today’s lecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Today’s lecture Today’s topic: Describing distributions with numbers • Introduction • Measuring center • Measuring spread

  2. Describing distributions with numbers: frequency

  3. We saw several graphical ways to describe data For example – a histogram frequency

  4. What is the difference between the blue and the green histograms? frequency

  5. They both have the same center but the blue one is more spread out frequency

  6. Now – the blue one is more spread out, the green one has a lower center frequency

  7. Now – the blue one is more spread out, the green one has a higher center frequency

  8. We would like to find numbers that represent these distributions: • One number would represent the center of the distribution • Another number would represent the variance of the distribution.

  9. Describing distributions with numbers: Remember!!! Begin by looking at the data – graphical representation Only then – use numbers to summarize the data

  10. Measuring center • Mean • Median • Mode

  11. The mean The meanis the arithmetic average Example: The money a student spends at a certain university cafeteria: Add them up and divide by 5: • MON TUE WED THU FRI • $2.50 $4.65 $5.35 $1.50 $2.50

  12. The mean is denoted by: (x bar) • The number of observations is denoted by: n • The observations are denoted by: • For example: • MON TUE WED THU FRI • $2.50 $4.65 $5.35 $1.50 $2.50 n=5 X1 x2x3x4x5

  13. For n observations , the mean is given by: For example:

  14. We can also write the formula of the mean as: (Note: )

  15. Questions: • What is the average of 1,2,3,4? ______ • What is the average of 100,200,300,400? ______ • What is the average of 101,102,103,104? ______

  16. Mean is not resistant to outliers: • 1, 18, 1, 4 • 1, 180, 1, 4

  17. Question: Average of 10 men is 200 lbs. A 150 lbs man walks into the room. What is the average of the 11 men?

  18. Answer:

  19. The median • The median (M) is “the middle value” 1, 8, 3, 11  1, 3, 8, 11 Sort data Median is here !

  20. Resistant to an extreme observation 1, 8, 3, 111, 8, 3 ,110 1, 3, 8, 111, 3, 8 ,110

  21. How to calculate the median: First: arrange all observations on order of size • For odd number of observations - M is the center observation: • For even number of observations, M is the mean of the two center observations M=the observation in the (n+1)/2 place • 1 3 5 8 100 M=5 M=the observation that is between the places n/2 and n/2+1 • 4 8 16 18 50 100

  22. Question:TheAnnual Salaries (in thousands) of 20 employees in a firm are: What is the median salary in the firm? • 30 30 60 60 60 60 60 6060 • 60 606060606060 60120500

  23. Question:Where are the mean and median in the following distributions: Long left hand tail: Long right hand tail: symmetric

  24. Example ? ?

  25. Properties of mean and median: • Outliers affect the _______________ • Use __________ when interested in a typical value • In a ________ distribution mean and median are close together • In a skewed distribution ________ is farther out in the long tail

  26. Question: In a class of 25 students, 22 students had grades between 71 and 80 (both endpoints included), and three students had grades between 91 and 100 (both endpoints included). For thesedata, a. the median must be between 71 and 80.b. the mean must be between 71 and 80.c. both of the above.

  27. Mode: • Most frequent score, The value that appears more times than other values is 60  Mode = 60 • 30 30 60 60 60 60 60 6060 • 60 606060606060 60120500

  28. Mode in interval data Mode = middle of interval [10,20)

  29. Measuring spread • Variance (and standard deviation) • The interquartile range

  30. Why measure spread? Example: Salaries - A (in thousands): 50 50 50 50 50 50 50 50 mean=50 Salaries - B (in thousands): 10 50 50 50 50 50 50 80 mean=50 Salaries - C (in thousands): 10 10 10 10 90 90 90 90 mean=50  A measure of center alone can be misleading!!!

  31. Measuring spread using the variance Salaries - A (in thousands): 50 50 50 50 50 50 50 50 mean=50 Salaries - B (in thousands): 10 50 50 50 50 50 50 80 mean=50 Salaries - C (in thousands): 10 10 10 10 90 90 90 90 mean=50 How variant are the observations? Or, how far, on average are the observations from the mean? In A: 0 0 0 0 0 0 0 0 All observations in A do not deviate from the mean, so the variance = 0

  32. Measuring spread using the variance Deviations in C: values: 10 10 10 10 90 90 90 90 deviations from the mean: -40 -40 -40 -40 40 40 40 40 if we take regular average we obtain average = 0 (-40)+ (-40)+ (-40)+ (-40)+ (40)+ (40)+ (40)+ (40)=0  square the deviations and then average: (-40)2+ (-40)2+ (-40)2+ (-40)2+ (40)2+ (40)2+ (40)2+ (40)2=12,800 12,800/7=1828.6 Variance = 1828.6

  33. Measuring spread using the variance Back to the units of measurement: Srandard deviation = 42.8

  34. Variance: • X1, X2 , … , Xn • Compute deviations about • Square these deviations: • Add them up and average:

  35. Standard deviation: 4. Take the square root of the variance:

  36. Compare standard deviations of 3 set of salaries: Salaries - A (in thousands): 50 50 50 50 50 50 50 50 mean=50 SD=0 Salaries - B (in thousands): 10 50 50 50 50 50 50 90 mean=50 SD=21.4 Salaries - C (in thousands): 10 10 10 10 90 90 90 90 mean=50 SD=42.8

  37. Question: • What is the standard deviation (SD) of 5, 5, 5, 5? ________ • Question: if SD of 1, 2 ,3, 4 is 1.29, what is SD of 11, 12, 13, 14? ______

  38. Question: if SD of 1, 2 ,3, 4 is 1.29, what is SD of 11, 12, 13, 14? _________ • Answer:

  39. Q2 50% 75% 25% Q3 Q1 Measuring spread using the interquartile range First – some notations: The pth percentile is the value such that p percent of the observations fall bellow it • Median = 50th percentile = Q2 • 1st quartile = 25th percentile = Q1 • 3rd quartile = 75th percentile = Q3

  40. Q1 Q3 Measuring spread using the interquartile range IQR= Q3-Q1

  41. Salaries - A (in thousands): 50 50 50 50 50 50 50 50 Q1=50 Q3 =50 The interquartile range (IQR) is The range of the middle 50% of the observation IQR=0 Calculating Q1: 25% of 8 observations are the first 2 observations, so Q1 is right after the 2nd observation: between the 2nd and 3rd observations (50+50)/2=50 Calculating Q3: 75% of 8 observations are the first 6 observations, so Q2 is right after the 6th observation: between the 6th and 7th observations (50+50)/2=50

  42. Salaries - B (in thousands): 10 50 50 50 50 50 50 90 Salaries - C (in thousands): 10 10 10 10 90 90 90 90 IQR=0 Q1=50 Q3 =50 Q1=10 Q3 =90 IQR=80 The interquartile range (IQR) is The range of the middle 50% of the observation: IQR= Q3-Q1

  43. MeanSD IQR Salaries - A (in thousands): 50 50 50 50 50 50 50 50 mean=50 0 0 Salaries - B (in thousands): 10 50 50 50 50 50 50 90 mean=50 21.4 0 Salaries - C (in thousands): 10 10 10 10 90 90 90 90 mean=50 42.8 70

  44. Question: Asample was taken of the salaries of four employees from a large company. The following are their salaries (in thousands of dollars) for this year. 33 31 24 36 The variance of their salaries is a. 5.1b. 26c. 31

  45. The accompanying two histograms represent the distribution of acceptance rates (percent accepted) among 25 business schools in 1995. The histograms use different class intervals, but are based on the same data. In each class interval, the left endpoint is included but not the right. Which statement is true? a. The median must be less than 30. b. The interquartile range exceeds 30. c. Neither of the above. d. Both of the above.

More Related