480 likes | 497 Views
Learn how to compute and interpret various measures such as range, mean deviation, variance, standard deviation, and coefficient of variation for both ungrouped and grouped data. Understand the characteristics, uses, advantages, and disadvantages of each measure. Also, explore Chebyshev's theorem, percentiles, quartiles, interquartile range, box plots, and coefficient of skewness and kurtosis.
E N D
Other Descriptive Measures Chapter 4
1. 2. 3. Chapter Goals When you have completed this chapter, you will be able to: Compute and interpret the range, themeandeviation, the variance, the standard deviation, and the coefficient of variation of ungrouped data Compute and interpret the range, the variance, and the standard deviation from grouped data Explain the characteristics, uses, advantages, and disadvantages of each measure and...
4. 5. 6. 7. Chapter Goals Understand Chebyshev’s theorem and the normal or empirical rule, as it relates to a set of observations Compute and interpret percentiles, quartiles and the interquartile range Construct and interpret box plots Compute and describe the coefficient of skewness and kurtosis of a data distribution
Terminology Range …is the difference between thelargestand the smallest value. • Only two values are used in its calculation. • It is influenced by an extreme value. • It is easy to compute and understand.
Terminology Mean Deviation …is the arithmetic meanof the absolute values of the deviations from the arithmetic mean. • All values are used in the calculation. • It is not unduly influenced by large or small values. • The absolute values are difficult to manipulate.
Q uestion Solve The weights of a sample of crates containing books for the bookstore (in kg) are: 103 97 101 106 103 Find the range and the mean deviation.
Q uestion 1. Find the mean weight 2. Find the range Find the mean deviation - + + - 103 102 ... 103 102 + + + + 1 5 1 4 5 = = 5 5 3. 106 – 97 = 9 103 97 101 106 103 = 2.4
Terminology Variance …is the arithmetic mean of the squared deviations from the arithmetic mean. • All values are used in the calculation. • It is not influenced by extreme values. • The units are awkward…the square of the original units. Computation
Computing the Variance Formula … for a Population Formula … for a Sample
The ages of the Dunn family are: 2, 18, 34, 42 What is the population meanand variance?
s s 2 = = 236 Population Standard Deviation … is the square root of the population variance From previous example… = 15.36 Example
37 = 5 21.2 = ( 2 + ( 2 ) ) - + - 7 7 . 4 ... 6 7 . 4 5-1 = s2 s = 5.29 - 5 1 EXAMPLE The hourly wages earned by a sample of five students are: $7, $5, $11, $8, $6. Find the mean, variance, and Standard Deviation. = 7.40 = 5.30 = 2.30
From chapter 3…. The Mean of Grouped Data Example A sample of ten movie theatres in a metropolitan area tallied the total number of movies showing last week. Compute the mean number of movies showing per theatre.
fx S The Mean of Grouped Data = N Example Continued… Movies Showing Frequency f Class Midpoint (f)(x) 1 to under 3 1 2 2 3 to under 5 2 4 8 5 to under 7 3 6 18 7 to under 9 1 8 8 9 to under 11 3 10 30 Total 10 66
fx fx S S The Mean of Grouped Data = = N N Example Continued… Movies Showing Frequency f Class Midpoint (f)(x) A Total 10 66 Formula 66 = 10 Now: Compute the variance and standard deviation. = 6.6
2 S ( fx ) 2 S - fx n = 2 s - n 1 Sample Variance for Grouped Data The formula for the sample variance for grouped data is: where fis class frequency and X is class midpoint
Sample Variance for Grouped Data Movies Showing Frequency f Class Midpoint (f)(x) (x2)f 2 1 to under 3 1 2 4 4 3 to under 5 2 8 32 5 to under 7 3 18 108 6 8 7 to under 9 1 8 64 9 to under 11 3 30 300 10 Total 10 66 508
Movies Showing Frequency f Class Midpoint (f)(x) (x2)f Total 10 66 508 2 S ( fx ) 2 S - fx n = 2 s - n 1 662 508 - 10 9 8.04 Sample Variance for Grouped Data = The standard deviation is = 2.8 The variance is = 8.04
1 - Formula 1 k 2 Interpretation and Uses of the Standard Deviation Chebyshev’s Theorem:For any set of observations, the minimum proportion of the values that lie withinkstandard deviations of the mean is at least: wherek2is anyconstant greater than 1 Example
Suppose that a wholesale plumbing supply company has a group of 50 sales vouchers from a particular day. The amount of these vouchers are: How well does this data set fit Chebychev’s Theorem? Solution
Using Step 1 Determine the mean and standard deviation of the sample Step 2 1 1 - Inputk =2 into Chebyshev’s theorem 22 Step 3 Solution (continued) Mean = $319 SD = $101.78 = 1 – ¼ = 3/4 i.e. At least.75 of the observations will fall within 2SDof the mean.
Step 3 Using the mean and SD, find the range of data values within 2 SD of the mean ( - 2S, + 2S) x x Proportion Solution (continued) Mean = $319 SD = $101.78 = 319 - (2)101.78, 319 +2(101.78) = (115.44, 522.56) Now, go back to the sample data, and see whatproportion of the values fall between 115.44 and 522.5656
Solution (continued) Proportion of the values that fall between 115.44 and 522.56 We find that 48-50 or 96% of the data values are in this range – certainly at least 75%as the theorem suggests!
Interpretation and Uses of the Standard Deviation Empirical Rule: For any symmetrical, bell-shaped distribution: …About 68% of the observations will lie within 1s of the mean …About 95% of the observations will lie within 2s of the mean …Virtually all the observations will be within 3s of the mean
Bell-Shaped Curve …showing the relationship between s m and m m-3s m+ 3s m-2s m+2s m-1s m+1s
Suppose that a wholesale plumbing supply company has a group of 50 sales vouchers from a particular day. The amount of these vouchers are: How well does this data set fit the Empirical Rule? Solution
Solution First check if the histogram has an approximate mound-shape Not bad…so we’ll proceed! We need to calculate the mean and standard deviation
- + ( x s , x s ) = (217.22, 420.78) - + ( x 2 s , x 2 s ) - + ( x 3 s , x 3 s ) (13.66, 624.34) Mean: $319 Standard Deviation: $101.78 Calculate the intervals: = (319-101.78, 319+101.78) = 319 -(2)101.78, 319 +2(101.78) =(115.44, 522.56) = 319-(3)101.78, 319 + 3(101.78) = IntervalEmpirical RuleActual # valuesActual percentage 217.22, 420.7868%31/5062% 115.44, 522.5695%48/5096% 13.66, 624.34 100%49/5098%
Skewness ( ) Mean - Median 3 SK1 = σ …is the measurement of the lack of symmetryof the distribution …The coefficient of skewness can range from -3.00 up to +3.00 …A value of 0 indicates a symmetric distribution. It is computed as follows:
Skewness ( ) Mean -Median 3 SK1 = σ Following are the earnings per share for a sample of 15 software companies for the year 2000. The earnings per share are arranged from smallest to largest. $0.09 0.13 0.41 0.51 1.12 1.20 1.49 3.18 3.50 6.36 7.83 8.92 10.13 12.99 16.40 Find the coefficient of skewness. Mean = 4.95 SK1= 3(4.95-3.18)/5.22 Median = 3.18 = 1.017 SD = 5.22
Skewed Right Positively Skewed Distribution Mean and Median are to the right of the Mode Mode< Median< Mean
Skewed left < Mode < Median Mean Negatively Skewed Distribution Mean and Median are to the left of the Mode
Interquartile Range …is the distance between the third quartileQ3 and the first quartileQ1. This distancewill include the middle 50 percent of the observations. Interquartile Range = Q3 - Q1 Example
Example For a set of observations the third quartile is 24 and the first quartile is 10. What is the interquartile range? The interquartile range is 24 - 10 = 14. Fifty percent of the observations will occur between 10 and 24.
Box Plots …is a graphical display, based on quartiles, that helps to picture a set of data Five pieces of data are needed to construct a box plot: … the Minimum Value, … the First Quartile, … the Median, … the Third Quartile, and … the Maximum Value Example
Example Based on a sample of 20 deliveries, Buddy’s Pizza determined the following information. The…minimum delivery time was 13minutes …the maximum 30 minutes The…first quartile was 15 minutes …the median 18 minutes, and … the third quartile 22 minutes Develop a box plot for the delivery times. Solution
Solution 12 14 16 18 20 22 24 26 28 30 32 Min. Q1 Median Q3 Max.
Investment Decision The following are the average rates of return for Stocks A and B over a six year period, In which of the following Stocks would you prefer to invest? Why? Stock A:7 6 8 5 7 3 Stock B: 15 -10 18 10 -5 8
Investment Decision First Find the Mean rate of return for each of the two stocks: Stock A:7 6 8 5 7 3 Mean = 36/6 = 6 Stock B: 15 -10 18 10 -5 8 Mean = 36/6 = 6 Next
Investment Decision Next Find the Range of Values of each stock: Stock A:7 6 8 5 7 3 8 – 3 = 5 Stock B: 15 -10 18 10 -5 8 18 – ( -10) = 28 Therefore, Stock B is riskier.
Relative Dispersion The coefficient of variation is the ratio of the standard deviation to the arithmetic mean, expressed as a percentage: s = CV ( 100%) x A standard deviation of 10 may be perceived as large when the mean value is 100, but only moderately largewhen the mean value is 500!
Co-efficient of Variation • Which one has a higher level of risk? • Example • Rates of return over the past 6 years for two mutual funds are shown below. • Fund A: 8.3, -6.0, 18.9, -5.7, 23.6, 20Fund B: 12, -4.8, 6.4, 10.2, 25.3, 1.4 Solution
Co-efficient of Variation Fund A Fund B Mean 9.85 Mean 8.42 Standard Error 5.38 Standard Error 4.20 Median 13.60 Median 8.30 Mode #N/A Mode #N/A Standard Deviation 13.19 Standard Deviation 10.29 Sample Variance 173.88 Sample Variance 105.81 Kurtosis -2.21 Kurtosis 0.90 Skewness -0.44 Skewness 0.61 Range 29.60 Range 30.1 Minimum -6 Minimum -4.8 Maximum 23.6 Maximum 25.3 Sum 59.1 Sum 50.5 Count 6 Count 6 Solution Let us use the Excel printout that is run from the “Descriptive Statistics” sub-menu
Co-efficient of Variation Fund A Fund B Mean 9.85 Mean 8.42 Standard Error 5.38 Standard Error 4.20 Median 13.60 Median 8.30 Mode #N/A Mode #N/A Standard Deviation 13.19 Standard Deviation 10.29 Sample Variance 173.88 Sample Variance 105.81 Kurtosis -2.21 Kurtosis 0.90 Skewness -0.44 Skewness 0.61 Range 29.60 Range 30.1 Minimum -6 Minimum -4.8 Maximum 23.6 Maximum 25.3 Sum 59.1 Sum 50.5 Count 6 Count 6 Solution Is Fund A riskier because its standard deviation is larger?
Co-efficient of Variation Fund A Fund B Mean 9.85 Mean 8.42 Standard Error 5.38 Standard Error 4.20 Median 13.60 Median 8.30 Mode #N/A Mode #N/A Standard Deviation 13.19 Standard Deviation 10.29 Sample Variance 173.88 Sample Variance 105.81 Kurtosis -2.21 Kurtosis 0.90 Skewness -0.44 Skewness 0.61 Range 29.60 Range 30.1 Minimum -6 Minimum -4.8 Maximum 23.6 Maximum 25.3 Sum 59.1 Sum 50.5 Count 6 Count 6 Solution But the means of the two funds are different. Fund A has a higher rate of return, but it also has a larger sd. Therefore we need to compare the relative variability using the coefficient of variation.
Co-efficient of Variation s = CV ( 100%) x Solution Fund A: CV = 13.19 / 9.85 =1.34 Fund B: CV = 10.29 / 8.42 = 1.22 So now we say that there is more variability in FundA as compared to Fund B Therefore, Fund A is riskier.
www.mcgrawhill.ca/college/lind for quizzes extra content data sets searchable glossary access to Statistics Canada’s E-Stat data …and much more! Test your learning… Click on… Online Learning Centre