200 likes | 454 Views
SESSION 19 & 20. Last Update 16 th March 2011. Measures of Dispersion Measures of Variability - Grouped Data -. Learning Objectives. All measures for grouped data: Measures of relative standing: Median, Quartiles, Deciles and Percentiles Measures of dispersion: Range
E N D
SESSION 19 & 20 Last Update 16th March 2011 Measures of Dispersion Measures of Variability - Grouped Data -
Learning Objectives All measures for grouped data: • Measures of relative standing: Median, Quartiles, Deciles and Percentiles • Measures of dispersion: Range • Measures of variability: Variance and Standard Deviation • Empirical Rule and Chebysheff’sTheroem • Coefficient of Variation
Percentiles We can determine any percentile for grouped data using the following formula: For quartiles, the formula ‘simplifies’ to: Where m = 1, 2 , 3 or 4 for the first, second, third and fourth quartile
Calculation of Percentile • Calculate the less than cumulative frequencies f(<) from the observed frequencies f • Use the following formula to determine the location of the Pth percentile: Lp = (n + 1) * (P / 100) • Locate the interval Lp falls into
Calculation of Percentile • Determine the following parameters • Apply formula for Pth Percentile
Percentile: An example Let us assume the following grouped data is to be assessed: C = Upper + 1 – Lower C = 49 + 1 – 40 = 10
Percentile: An example If the data is interval (student marks approximately are), inequalities in the intervals may be more appropriate. This example comes from your student manual. The intervals on the right including inequalities may be somewhat more intuitive C = Upper + 1 – Lower C = 49 + 1 – 40 = 10 C = Upper – Lower C= 50 – 40 = 10
Solution – Step 1 Use the formula for the calculation to determine what interval the median falls into. Since 6 < 9.75 < 20, the median interval is 50 to < 60. Beware that the median interval is to be looked up in the cumulative frequency column, not the interval column!
Solution – Step 2 Read of the parameters required for the median formula for grouped data. The formula: Now yields: It is left as an exercise to confirm that the formula for Q yields the same result.
Variance Using the midpoints allows us to calculate the variance of grouped data as well. In the case of interval data, as with the mean, the original data is to be preferred to the grouped data. For ordinal or nominal data the variance has no probabilistic meaning! Measures of relative standing (i.e. percentiles) may be used for ordinal data. There are no measures of variability for nominal data (Example: 1 = married, 2 = single, 3 = divorced, 4 = widowed).
Calculation of Variance • Determine the interval midpoints x • Multiply the observed frequencies f with the interval midpoints (fx) • Sum the results from 2. and divide by n (Steps 1 to 3 are identical to calculating the mean for grouped data) • Square x and multiply by f yielding fx2
Calculation of Variance • Use the following formula to determine the variance for grouped data (sample): And for the population: Note that x denotes the midpoints here and not the actual observations.
Variance: An example Let us assume the following grouped data is to be assessed:
Solution – Step 2 Using the formula yields: As before, the square root yields the standard deviation.
Empirical Rule In normal bell-shaped frequency distribution polygons, we find the following: • Approx. 68.2% of all observations fall within one standard deviation of the mean • Approx. 95.4% of all observations fall within two standard deviations of the mean • Approx. 99.7% of all observations fall within three standard deviations of the mean x 95,44% 68,26% - 2s - 1s + 1s + 2s x
Chebycheff’s Theorem The Chebycheff Theorem is a more general alternative to the empirical rule, which applies to all shapes of histograms. The proportion of observations that lie within k standard deviations of the mean is at least: 1 – 1 / k2for k > 1 Where k denotes the standard deviations away from the mean
Chebycheff’s Theorem - Example The Empirical Rule provides approximate proportions under the assumption of a bell-shaped normal distribution, whereas Chebycheff’s Theorem provides lower bounds on the approximations for any types of distribution. Consequently, the tail-ends of the distribution are further apart. Chebycheff is not relevant to your examination!
Coefficient of Variation The coefficient of variation of a set of observations is the standard deviation divided by their mean: By relating the standard deviation to its mean one can make a statement about the variability of the data. Compare a standard deviation of 10 to a mean of 100 and a mean of 1,000,000!