300 likes | 437 Views
Why statisticians were created . Measure of dispersion FETP India. Competency to be gained from this lecture. Calculate a measure of variation that is adapted to the sample studied. Key issues. Range Inter-quartile variation Standard deviation.
E N D
Why statisticians were created Measure of dispersionFETP India
Competency to be gained from this lecture Calculate a measure of variation that is adapted to the sample studied
Key issues • Range • Inter-quartile variation • Standard deviation
Measures of spread, dispersion or variability • The measure of central tendency provides important information about the distribution • However, it does not provide information concerning the relative position of other data points in the sample • Measure of spread, dispersion or variability address are needed Range
Every concept comes from a failure of the previous concept • Mean is distorted by outliers • Median takes care of the outliers Range
The range: A simple measure of dispersion • Take the difference between the lowest value and the highest value • Limitation: • The range says nothing about the values between extreme values • The range is not stable: As the sample size increases, the range can change dramatically • Statistics cannot be used to look at the range Range
Example of a range • Take a sample of 10 heights: • 70, 95, 100, 103, 105, 107, 110, 112, 115 and 140 cms • Lowest (Minimum) value • 70cm • Highest (Maximum) value • 140cm • Range • 140 – 70 = 70cm Range
Three different distributions with the same range (35 Kgs) Even X X X X X X X X X 70 30 40 50 60 Uneven X X X X X X X X X 70 30 40 50 60 Clumped X X X X X X X X X 70 30 40 50 60 Range
The range increases with the sample size Two ranges based on different sample sizes are not comparable Range
Percentiles and quartiles • Percentiles • Those values in a series of observations, arranged in ascending order of magnitude, which divide the distribution into two equal parts • The median is the 50th percentile • Quartiles • The values which divide a series of observations, arranged in ascending order, into 4 equal parts • The median is the 2nd quartile Inter-quartile range
Sorting the data in increasing order • Median • Middle value (if n is odd) • Average of the two middle values (if n is even) • A measure of the “centre” of the data • Quartiles divide the set of ordered values into 4 equal parts Q2(Median) Q1 Q3 First 25% 2nd 25% 3rd 25% 4th 25%
The inter-quartile range • The central portion of the distribution • Calculated as the difference between the third quartile and the first quartile • Includes about one-half of the observations • Leaves out one quarter of the observations • Limitations: • Only takes into account two values • Not a mathematical concept upon which theories can be developed Inter-quartile range
The inter-quartile range: Example • Values • 29 , 31 , 24 , 29 , 30 , 25 • Arrange • 24 , 25 , 29 , 29, 30 , 31 • Q1 • Value of (n+1)/4=1.75 • 24+0.75 = 24.75 • Q3 • Value of (n+1)*3/4=5.2 • Q3 = 30+0.2 = 30.2 • Inter-quartile range = Q3 – Q1 = 30.2 – 24.75 Inter-quartile range
Graphic representation of theinter-quartile range Inter-quartile range
The mean deviation from the mean • Calculate the mean of all values • Calculate the difference between each value and the mean • Calculate the average difference between each value and the mean • Limitations: • The average between negative and positive deviations may generate a value of 0 while there is substantial variation Standard deviation
The mean deviation from the mean:Example Data 10 20 30 40 50 60 70 Mean = 280/7 = 40 Mean deviation from mean 10-40 20-40 ……… -30 -20 -10 0 10 20 30 Sum = 0 Standard deviation
Absolute mean deviation from the mean • Calculate the mean of all values • Calculate the difference between each value and the mean and take the absolute value • Calculate the average difference between each value and the mean • Limitations: • Absolute value is not good from a mathematical point of view Standard deviation
Absolute mean deviation from the mean: Example Data 10 20 30 40 50 60 70 Mean = 280/7 = 40 Mean deviation from mean 10-40 20-40 ……… -30 -20 -10 0 10 20 30 Absolute values 30 20 10 0 10 20 30 Mean deviation from mean = 120/7 = 17.1 Standard deviation
Calculating the variance (1/2) • Calculate the mean as a measure of central location (MEAN) • Calculate the difference between each observation and the mean (DEVIATION) • Square the differences (SQUARED DEVIATION) • Negative and positive deviations will not cancel each other out • Values further from the mean have a bigger impact Standard deviation
Calculating the variance (2/2) • Sum up these squared deviations (SUM OF THE SQUARED DEVIATIONS) • Divide this SUM OF THE SQUARED DEVIATIONS by the total number of observations minus 1 (n-1) to give the VARIANCE • Why divide by n - 1 ? • Adjustment for the fact that the mean is just an estimate of the true population mean • Tends to make the variance larger Standard deviation
The standard deviation • Take the square root of the variance • Limitations: • Sensitive to outliers Standard deviation
Example Mean = 45/9 = 9 x-rays Mean deviation = 8/5 = 1.6 x-rays Variance = (20/(5-1)) = 20/4 = 5 x-rays Standard deviation = 5 = 2.2
Properties of the standard deviation • Unaffected if same constant is added to (or subtracted from) every observation • If each value is multiplied (or divided) by a constant, the standard deviation is also multiplied (or divided) by the same constant Standard deviation
Need of a measure of variation that is independent from the measurement unit • The standard deviation is expressed in the same unit as the mean: • e.g., 3 cm for height, 1.4 kg for weight • Sometimes, it is useful to express variability as a percentage of the mean • e.g., in the case of laboratory tests, the experimental variation is ± 5% of the mean Standard deviation
The coefficient of variation • Calculate the standard deviation • Divide by the mean • The standard deviation becomes “unit free” • Coefficient of variation (%) = • [S.D / Mean] x 100 (Pure number) Standard deviation
Uses of the coefficient of variation • Compare the variability in two variables studied which are measured in different units • Height (cm) and weight (kg) • Compare the variability in two groups with widely different mean values • Incomes of persons in different socio- economic groups Standard deviation
Choosing a measure of central tendency and a measure of dispersion
Key messages • Report the range but be aware of its limitations • Report the inter-quartile deviation when you use the median • Report the standard deviation when you use a mean