250 likes | 382 Views
Cambodian Mekong University. MB102. Introduction to Statistics. Chapter 4 Measures of Dispersion. Learning Objectives. Calculate common measures of variation (including the range, interquartile range, mean deviation and standard deviation) from grouped and ungrouped data
E N D
Cambodian Mekong University MB102 Introduction to Statistics Chapter 4 Measures of Dispersion
Learning Objectives • Calculate common measures of variation (including the range, interquartile range, mean deviation and standard deviation) from grouped and ungrouped data • Calculate and interpret the coefficient of variation
1. Introduction • A measure of central tendency in itself is not sufficient to describe a set of data adequately • A measure of dispersion (or spread) of the data is usually required • This measure gives an indication of the internal variation of the data—that is, the extent to which data items vary from one another or from a central point • Some reasons for requiring a measure of dispersion of a set of data: • As an indication of the reliability of the average value • To assist in controlling unwanted variation
2. The range • The simplest measure of dispersion is the range • It is the difference between the largest and smallest values in a set of data Range = largest observation – smallest observation • Examples of uses of range include • Temperature fluctuations on a given day • Movement of share prices…
2. The range • Range is considered primitive as it considers only the extreme values, which may not be useful indicators of the bulk of the population • Extreme values, called outliers, may often result from errors of measurement • Outliers are defined as values that are inconsistent with the rest of the data • Although the range is the quickest and easiest measure of dispersion to calculate, its should be interpreted with some caution
3. The interquartile range (midspread) • Measures the range of the middle 50% of the values only • Is defined as the difference between the upper and lower quartiles Interquartile range = upper quartile – lower quartile = Q3 – Q1 • May be calculated from grouped frequency distributions that contain open-ended class intervals • It is usually only used with a large number of observations
4. The mean deviation • The mean deviation takes into account the actual value of each observation • It measures the ‘average’ distance of each observation away from the mean of the data • It gives an equal weight to each observation • It is generally more sensitive than the range or interquartile range, since a change in any value will affect it
4. The mean deviation • The residual measures the actual deviation (or distance) of each observation from the mean • A set of x values has a mean of • Theresidual of a particular x-value is: Example If the mean for a set of data is 3.22, find the residual for an observation of 4.38 Solution The residual of 4.38 is 4.38 – 3.22 = 1.16 Note: Residuals can be in the negative range. It shows that the observation is below the mean
4. The mean deviation • The mean deviation is defined as the mean of these absolute deviations: • To calculate the mean deviation Step 1: Calculate the mean of the data Step 2: Subtract the mean from each observation and record the resulting differences Step 3: Write down the absolute value of each of the differences found in Step 2 (ignore their signs) Step 4: Calculate the mean of the absolute values of the differences found in step 3
4. The mean deviation Example The batting scores of a cricketer was recorded over 10 completed innings to date. His scores were: 32, 27, 38, 25, 20, 32, 34, 28, 40, 29 Calculate the mean deviation of the cricketers’ scores Solution Step 1 The cricketers’ average number of runs is 30.5
4. The mean deviation • Step 2 and 3 completed in the table • Step 4
4. The mean deviation • Calculation of the mean deviation from a frequency distribution • If the data is in the form of a frequency distribution, the mean deviation can be calculated Where f = the frequency on an observation x = the sum on the frequencies = n
5. The standard deviation • The most commonly used measure of dispersion is the standard deviation • It takes into account every observation and measures the ‘average deviation’ of observations from mean • It works with squares of residuals, not absolute values, therefore it is easier to use in further calculations • The values of the mean deviation and standard deviation should be reasonably close, since they are both measuring the variation of the observations from their mean
5. The standard deviation • Population standard deviation • Uses squares of the residuals, which will eliminate the effect of the signs, since squares of numbers cannot be negative Step 1: find the sum of the squares of the residuals Step 2: find their mean. Step 3: take the square root of this mean. Where N = the size of the population The square of the population standard deviation is called the variance.
5. The standard deviation • Sample standard deviation • It is rare to calculate the value of since populations are usually very large • It is far more likely that the sample standard deviation (denoted by s) will be needed. • Where: (n – 1) is the number of observations in the sample
5. The standard deviation • A note on the use of (n − 1) in formulae • If the value of n is large, it will only make a slight difference to the answer whether you divide by n or (n − 1) • To calculate the value of s from a sample the calculator buttonwill usually be indicated by one of sn−1 or xsn−1 or sx or swritten either on it or near it • To calculate the value of sfrom a population, the calculator key will usually be indicated by one of sn or xsn or sx or swritten either on it or near it
5. The standard deviation • Important points about the standard deviation • The standard deviation cannot be negative • The standard deviation of a set of data is zero if, and only if, the observations are of equal value • The standard deviation can never exceed the range of the data • The more scattered the data, the greater the standard deviation • The square of the standard deviation is called the variance
5. The standard deviation • Calculation of the sample standard deviation Step 1: Calculate the mean Step 2: For each x-value, find the value of the residual Step 3: Square the residuals Step 4: Calculate the sum of the squares of the residuals Step 5: Divide the sum found in step 4 by (n – 1) Step 6: Take the square root of the quantity found in step 5: this is the sample standard deviation
5. The standard deviation • Calculation of the standard deviation from a frequency distribution • If the data are in the form of a frequency distribution, • Calculate standard deviation using:
5. The standard deviation • Calculation of the standard deviation from a grouped frequency distribution • When calculating s from a grouped frequency distribution, we should assume that the observations in each class interval are concentrated at the midpoint of the interval • Where = the estimated mean of the sample m = the midpoint of the class interval f = the frequency of the class interval
6. The coefficient of variation • This is a measure of relative variability used to: • measure changes that have occurred in a population over time • compare variability of two populations that are expressed in different units of measurement • It is expressed as a percentage rather than in terms of the units of the particular data
6. The coefficient of variation • The formula for the coefficient of variation (V) is: • Where = the mean of the sample • s = the standard deviation of the sample
6. The coefficient of variation Example Calculate the coefficient of variation for the price of 400 g cans of pet food, given that the mean is 81 cents and s = 6.77 cents. Interpret the results. Solution This means that the standard deviation of the price of a 400g can of pet food is 8.36% of the mean price.
7. Remarks • Among the more important characteristics of the standard deviation are: • It is the most frequently used measure of dispersion, and because of its mathematical properties it has widespread use in problems involving statistical inference • If the mean cannot be calculated, neither can the standard deviation • Its value is affected by the value of every observation in the data • If the data have a number of extreme values, the value of the standard deviation may be distorted so as not to be a good ‘representative’ measure of dispersion
Summary • Among the more important characteristics of the standard deviation are: • It is the most frequently used measure of dispersion, and because of its mathematical properties it has widespread use in problems involving statistical inference. • If the mean cannot be calculated. neither can the standard deviation. • Its value is affected by the value of every observation in the data. • If the data have a number of extreme values, the value of the standard deviation may be distorted so as not to be a good ‘representative’ measure of dispersion.