460 likes | 802 Views
Overview of Today’s lecture. Today’s topic: Describing distributions with numbers Introduction Measuring center Measuring spread. Describing distributions with numbers:. frequency. We saw several graphical ways to describe data For example – a histogram. frequency.
E N D
Overview of Today’s lecture Today’s topic: Describing distributions with numbers • Introduction • Measuring center • Measuring spread
Describing distributions with numbers: frequency
We saw several graphical ways to describe data For example – a histogram frequency
What is the difference between the blue and the green histograms? frequency
They both have the same center but the blue one is more spread out frequency
Now – the blue one is more spread out, the green one has a lower center frequency
Now – the blue one is more spread out, the green one has a higher center frequency
We would like to find numbers that represent these distributions: • One number would represent the center of the distribution • Another number would represent the variance of the distribution.
Describing distributions with numbers: Remember!!! Begin by looking at the data – graphical representation Only then – use numbers to summarize the data
Measuring center • Mean • Median • Mode
The mean The meanis the arithmetic average Example: The money a student spends at a certain university cafeteria: Add them up and divide by 5: • MON TUE WED THU FRI • $2.50 $4.65 $5.35 $1.50 $2.50
The mean is denoted by: (x bar) • The number of observations is denoted by: n • The observations are denoted by: • For example: • MON TUE WED THU FRI • $2.50 $4.65 $5.35 $1.50 $2.50 n=5 X1 x2x3x4x5
For n observations , the mean is given by: For example:
Questions: • What is the average of 1,2,3,4? ______ • What is the average of 100,200,300,400? ______ • What is the average of 101,102,103,104? ______
Mean is not resistant to outliers: • 1, 18, 1, 4 • 1, 180, 1, 4
Question: Average of 10 men is 200 lbs. A 150 lbs man walks into the room. What is the average of the 11 men?
The median • The median (M) is “the middle value” 1, 8, 3, 11 1, 3, 8, 11 Sort data Median is here !
Resistant to an extreme observation 1, 8, 3, 111, 8, 3 ,110 1, 3, 8, 111, 3, 8 ,110
How to calculate the median: First: arrange all observations on order of size • For odd number of observations - M is the center observation: • For even number of observations, M is the mean of the two center observations M=the observation in the (n+1)/2 place • 1 3 5 8 100 M=5 M=the observation that is between the places n/2 and n/2+1 • 4 8 16 18 50 100
Question:TheAnnual Salaries (in thousands) of 20 employees in a firm are: What is the median salary in the firm? • 30 30 60 60 60 60 60 6060 • 60 606060606060 60120500
Question:Where are the mean and median in the following distributions: Long left hand tail: Long right hand tail: symmetric
Example ? ?
Properties of mean and median: • Outliers affect the _______________ • Use __________ when interested in a typical value • In a ________ distribution mean and median are close together • In a skewed distribution ________ is farther out in the long tail
Question: In a class of 25 students, 22 students had grades between 71 and 80 (both endpoints included), and three students had grades between 91 and 100 (both endpoints included). For thesedata, a. the median must be between 71 and 80.b. the mean must be between 71 and 80.c. both of the above.
Mode: • Most frequent score, The value that appears more times than other values is 60 Mode = 60 • 30 30 60 60 60 60 60 6060 • 60 606060606060 60120500
Mode in interval data Mode = middle of interval [10,20)
Measuring spread • Variance (and standard deviation) • The interquartile range
Why measure spread? Example: Salaries - A (in thousands): 50 50 50 50 50 50 50 50 mean=50 Salaries - B (in thousands): 10 50 50 50 50 50 50 80 mean=50 Salaries - C (in thousands): 10 10 10 10 90 90 90 90 mean=50 A measure of center alone can be misleading!!!
Measuring spread using the variance Salaries - A (in thousands): 50 50 50 50 50 50 50 50 mean=50 Salaries - B (in thousands): 10 50 50 50 50 50 50 80 mean=50 Salaries - C (in thousands): 10 10 10 10 90 90 90 90 mean=50 How variant are the observations? Or, how far, on average are the observations from the mean? In A: 0 0 0 0 0 0 0 0 All observations in A do not deviate from the mean, so the variance = 0
Measuring spread using the variance Deviations in C: values: 10 10 10 10 90 90 90 90 deviations from the mean: -40 -40 -40 -40 40 40 40 40 if we take regular average we obtain average = 0 (-40)+ (-40)+ (-40)+ (-40)+ (40)+ (40)+ (40)+ (40)=0 square the deviations and then average: (-40)2+ (-40)2+ (-40)2+ (-40)2+ (40)2+ (40)2+ (40)2+ (40)2=12,800 12,800/7=1828.6 Variance = 1828.6
Measuring spread using the variance Back to the units of measurement: Srandard deviation = 42.8
Variance: • X1, X2 , … , Xn • Compute deviations about • Square these deviations: • Add them up and average:
Standard deviation: 4. Take the square root of the variance:
Compare standard deviations of 3 set of salaries: Salaries - A (in thousands): 50 50 50 50 50 50 50 50 mean=50 SD=0 Salaries - B (in thousands): 10 50 50 50 50 50 50 90 mean=50 SD=21.4 Salaries - C (in thousands): 10 10 10 10 90 90 90 90 mean=50 SD=42.8
Question: • What is the standard deviation (SD) of 5, 5, 5, 5? ________ • Question: if SD of 1, 2 ,3, 4 is 1.29, what is SD of 11, 12, 13, 14? ______
Question: if SD of 1, 2 ,3, 4 is 1.29, what is SD of 11, 12, 13, 14? _________ • Answer:
Q2 50% 75% 25% Q3 Q1 Measuring spread using the interquartile range First – some notations: The pth percentile is the value such that p percent of the observations fall bellow it • Median = 50th percentile = Q2 • 1st quartile = 25th percentile = Q1 • 3rd quartile = 75th percentile = Q3
Q1 Q3 Measuring spread using the interquartile range IQR= Q3-Q1
Salaries - A (in thousands): 50 50 50 50 50 50 50 50 Q1=50 Q3 =50 The interquartile range (IQR) is The range of the middle 50% of the observation IQR=0 Calculating Q1: 25% of 8 observations are the first 2 observations, so Q1 is right after the 2nd observation: between the 2nd and 3rd observations (50+50)/2=50 Calculating Q3: 75% of 8 observations are the first 6 observations, so Q2 is right after the 6th observation: between the 6th and 7th observations (50+50)/2=50
Salaries - B (in thousands): 10 50 50 50 50 50 50 90 Salaries - C (in thousands): 10 10 10 10 90 90 90 90 IQR=0 Q1=50 Q3 =50 Q1=10 Q3 =90 IQR=80 The interquartile range (IQR) is The range of the middle 50% of the observation: IQR= Q3-Q1
MeanSD IQR Salaries - A (in thousands): 50 50 50 50 50 50 50 50 mean=50 0 0 Salaries - B (in thousands): 10 50 50 50 50 50 50 90 mean=50 21.4 0 Salaries - C (in thousands): 10 10 10 10 90 90 90 90 mean=50 42.8 70
Question: Asample was taken of the salaries of four employees from a large company. The following are their salaries (in thousands of dollars) for this year. 33 31 24 36 The variance of their salaries is a. 5.1b. 26c. 31
The accompanying two histograms represent the distribution of acceptance rates (percent accepted) among 25 business schools in 1995. The histograms use different class intervals, but are based on the same data. In each class interval, the left endpoint is included but not the right. Which statement is true? a. The median must be less than 30. b. The interquartile range exceeds 30. c. Neither of the above. d. Both of the above.