1 / 54

Skewness & Kurtosis: Reference

Skewness & Kurtosis: Reference. Source: http://mathworld.wolfram.com/NormalDistribution.html. Further Moments – Skewness. Skewness measures the degree of asymmetry exhibited by the data If skewness equals zero, the histogram is symmetric about the mean

jesimae
Download Presentation

Skewness & Kurtosis: Reference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Skewness & Kurtosis: Reference Source: http://mathworld.wolfram.com/NormalDistribution.html

  2. Further Moments – Skewness • Skewness measures the degree of asymmetry exhibited by the data • If skewness equals zero, the histogram is symmetric about the mean • Positive skewness vs negative skewness • Skewness measured in this way is sometimes referred to as “Fisher’s skewness”

  3. Further Moments – Skewness Source: http://library.thinkquest.org/10030/3smodsas.htm

  4. Mode Median Mean A B

  5. Median Mean n = 26 mean = 4.23 median = 3.5 mode = 8

  6. Value Occurrences Deviation Cubed deviation Occur*Cubed 1 1 (1 – 4.23) = -3.23 (-3.23)3 = -33.70 -33.70 2 4 (2 – 4.23) = -2.23 (-2.23)3 = -11.09 -44.36 3 8 (3 – 4.23) = -1.23 (-1.13)3 = -1.86 -14.89 4 4 (4 – 4.23) = -0.23 (-0.23)3 = -0.01 -0.05 5 3 (5 – 4.23) = 0.77 (+0.77)3 = 0.46 1.37 6 2 (6 – 4.23) = 1.77 (+1.77)3 = 5.54 11.09 7 1 (7 – 4.23) = 2.77 (+2.77)3 = 21.25 21.25 8 1 (8 – 4.23) = 3.77 (+3.77)3 = 53.58 53.58 9 1 (9 – 4.23) = 4.77 (+4.77)3 = 108.53 108.53 10 1 (10 - 4.23)= 5.77 (+5.77)3 = 192.10 192.10 Sum = 294.94 Mean = 4.23 s = 2.27 Skewness = 0.97

  7. Mode Median Mean Skewness > 0 (Positively skewed)

  8. Mode Median Mean A B Skewness < 0 (Negatively skewed)

  9. Source: http://mathworld.wolfram.com/NormalDistribution.html Skewness = 0 (symmetric distribution)

  10. Skewness – Review • Positive skewness • There are more observations below the mean than above it • When the mean is greater than the median • Negative skewness • There are a small number of low observations and a large number of high ones • When the median is greater than the mean

  11. Kurtosis – Review • Kurtosis measures how peaked the histogram is (Karl Pearson, 1905) • The kurtosis of a normal distribution is 0 • Kurtosis characterizes the relative peakedness or flatness of a distribution compared to the normal distribution

  12. Kurtosis – Review • Platykurtic– When the kurtosis < 0, the frequencies throughout the curve are closer to be equal (i.e., the curve is more flat and wide) • Thus, negative kurtosis indicates a relatively flat distribution • Leptokurtic– When the kurtosis > 0, there are high frequencies in only a small part of the curve (i.e, the curve is more peaked) • Thus, positive kurtosis indicates a relatively peaked distribution

  13. Source: http://espse.ed.psu.edu/Statistics/Chapters/Chapter3/Chap3.html

  14. Measures of central tendency – Review • Measures of the location of the middle or the center of a distribution • Mean • Median • Mode

  15. Mean – Review • Mean – Average value of a distribution; Most commonly used measure of central tendency • Median – This is the value of a variable such that half of the observations are above and half are below this value, i.e., this value divides the distribution into two groups of equal size • Mode - This is the most frequently occurring value in the distribution

  16. An Example Data Set • Daily low temperatures recorded in Chapel Hill (01/18-01/31, 2005, °F) Jan. 18 – 11 Jan. 25 – 25 Jan. 19 – 11 Jan. 26 – 33 Jan. 20 – 25 Jan. 27 – 22 Jan. 21 – 29 Jan. 28 – 18 Jan. 22 – 27 Jan. 29 – 19 Jan. 23 – 14 Jan. 30 – 30 Jan. 24 – 11 Jan. 31 – 27 • For these 14 values, we will calculate all three measures of central tendency - the mean, median, and mode

  17. Mean – Review • Mean –Most commonly used measure of central tendency • Procedures • (1) Sum all the values in the data set • (2) Divide the sum by the number of values in the data set • Watch for outliers

  18. Mean – Review • (1) Sum all the values in the data set  11 + 11 + 11 + 14 + 18 + 19 + 22 + 25 + 25 + 27 + 27 + 29 + 30 + 33 = 302 • (2) Divide the sum by the number of values in the data set  Mean= 302/14 = 21.57 • Is this a good measure of central tendency for this data set?

  19. Median – Review • Median -1/2of the values are above it & 1/2 below • (1) Sort the data in ascending order • (2) Find the value with an equal number of values above and below it • (3) Odd number of observations  [(n-1)/2]+1 value from the lowest • (4) Even number of observations  average (n/2) and [(n/2)+1] values • (5) Use the median with asymmetric distributions, particularly with outliers

  20. Median – Review • (1) Sort the data in ascending order:  11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33 • (2) Find the value with an equal number of values above and below it Evennumber of observations  average the (n/2) and [(n/2)+1] values  (14/2) = 7; [(14/2)+1] = 8  (22+25)/2 = 23.5 (°F) • Is this a good measure of central tendency for this data?

  21. Mode – Review • Mode – This is the most frequently occurring value in the distribution • (1) Sort the data in ascending order • (2) Count the instances of each value • (3) Find the value that has the most occurrences • If more than one value occurs an equal number of times and these exceed all other counts, we have multiple modes • Use the mode for multi-modal data

  22. Mode – Review • (1) Sort the data in ascending order:  11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33 • (2) Count the instances of each value:  11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33 3x 1x 1x 1x 1x 2x 2x 1x 1x 1x • (3) Find the value that has the most occurrences mode = 11 (°F) • Is this a good measure of the central tendency of this data set?

  23. Measures of Dispersion – Review • In addition to measures of central tendency, we can also summarize data by characterizing its variability • Measures of dispersion are concerned with the distribution of values around the mean in data: • Range • Interquartile range • Variance • Standard deviation • z-scores • Coefficient of Variation (CV)

  24. An Example Data Set • Daily low temperatures recorded in Chapel Hill (01/18-01/31, 2005, °F) Jan. 18 – 11 Jan. 25 – 25 Jan. 19 – 11 Jan. 26 – 33 Jan. 20 – 25 Jan. 27 – 22 Jan. 21 – 29 Jan. 28 – 18 Jan. 22 – 27 Jan. 29 – 19 Jan. 23 – 14 Jan. 30 – 30 Jan. 24 – 11 Jan. 31 – 27 • For these 14 values, we will calculate all measures of dispersion

  25. Range – Review • Range – The difference between the largest and the smallest values • (1) Sort the data in ascending order • (2) Find the largest value  max • (3) Find the smallest value  min • (4) Calculate the range  range = max - min • Vulnerable to the influence of outliers

  26. Range – Review • Range – The difference between the largest and the smallest values • (1) Sort the data in ascending order  11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33 • (2) Find the largest value  max = 33 • (3) Find the smallest value  min = 11 • (4) Calculate the range  range = 33 – 11 =22

  27. Interquartile Range – Review • Interquartile range – The difference between the 25th and 75th percentiles • (1) Sort the data in ascending order • (2) Find the 25th percentile – (n+1)/4 observation • (3) Find the 75th percentile – 3(n+1)/4 observation • (4) Interquartile range is the difference between these two percentiles

  28. Interquartile Range – Review • (1) Sort the data in ascending order  11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33 • (2) Find the 25th percentile – (n+1)/4 observation  (14+1)/4 = 3.75  11+(14-11)*0.75 = 13.265 • (3) Find the 75th percentile– 3(n+1)/4 observation  3(14+1)/4 = 11.25  27+(29-27)*0.25 = 27.5 • (4) Interquartile range is the difference between these two percentiles  27.5 – 13.265 = 14.235

  29. Variance – Review • Variance is formulated as the sum of squares of statistical distances (or deviation) divided by the population size or the sample size minus one:

  30. Variance – Review • (1) Calculate the mean  • (2) Calculate the deviation for each value  • (3) Square each of the deviations  • (4) Sum the squared deviations  • (5) Divide the sum of squares by (n-1) for a sample 

  31. Variance – Review • (1) Calculate the mean  • (2) Calculate the deviation for each value  Jan. 18 (11 – 25.7) = -10.57 Jan. 25 (25 – 25.7) = 3.43 Jan. 19 (11 – 25.7) = -10.57 Jan. 26 (33 – 25.7) = 11.43 Jan. 20 (25 – 25.7) = 3.43 Jan. 27 (22 – 25.7) = 0.43 Jan. 21 (29 – 25.7) = 7.43 Jan. 28 (18 – 25.7) = -3.57 Jan. 22 (27 – 25.7) = 5.43 Jan. 29 (19 – 25.7) = -2.57 Jan. 23 (14 – 25.7) = -7.57 Jan. 30 (30 – 25.7) = 8.42 Jan. 24 (11 – 25.7) = -10.57 Jan. 31 (27 – 25.7) = 5.42

  32. Variance – Review • (3) Square each of the deviations  Jan. 18 (-10.57)^2 = 111.76 Jan. 25 (3.43)^2 = 11.76 Jan. 19 (-10.57)^2 = 111.76 Jan. 26 (11.43)^2 = 130.61 Jan. 20 (3.43)^2 = 11.76 Jan. 27 (0.43)^2 = 0.18 Jan. 21 (7.43)^2 = 55.18 Jan. 28 (-3.57)^2 = 12.76 Jan. 22 (5.43)^2 = 29.57 Jan. 29 (-2.57)^2 = 6.61 Jan. 23 (7.57)^2 = 57.33 Jan. 30 (8.43)^2 = 71.04 Jan. 24 (-10.57)^2 = 111.76 Jan. 31 (5.43)^2 = 29.57 • (4) Sum the squared deviations  = 751.43

  33. Variance – Review • (5) Divide the sum of squares by (n-1) for a sample  = 751.43 / (14-1) = 57.8 • The variance of the Tmin data set (Chapel Hill) is 57.8

  34. Standard Deviation – Review • Standard deviation is equal to the square root of the variance • Compared with variance, standard deviation has a scale closer to that used for the mean and the original data

  35. Standard Deviation – Review • (1) Calculate the mean  • (2) Calculate the deviation for each value  • (3) Square each of the deviations  • (4) Sum the squared deviations  • (5) Divide the sum of squares by (n-1) for a sample  • (6) Take the square root of the resulting variance 

  36. Standard Deviation – Review • (1) – (5)  s2 = 57.8 • (6) Take the square root of the variance  • The standard deviation (s) of the Tmin data set (Chapel Hill) is 7.6 (°F)

  37. z-score – Review • Since data come from distributions with different means and difference degrees of variability, it is common to standardize observations • One way to do this is to transform each observation into a z-score • May be interpreted as the number of standard deviations an observation is away from the mean

  38. z-scores – Review • z-score is the number of standard deviations an observation is away from the mean • (1) Calculate the mean  • (2) Calculate the deviation  • (3) Calculate the standard deviation  • (4) Divide the deviation by standard deviation 

  39. Z-score for maximum Tmin value (33 °F) (1) Calculate the mean  (2) Calculate the deviation  (3) Calculate the standard deviation (SD)  (4) Divide the deviation by standard deviation  z-scores – Review

  40. Coefficient of Variation – Review • Coefficient of variation (CV) measures the spread of a set of data as a proportion of its mean. • It is the ratio of the sample standard deviation to the sample mean • It is sometimes expressed as a percentage • There is an equivalent definition for the coefficient of variation of a population

  41. Coefficient of Variation – Review • (1) Calculate mean  • (2) Calculatestandard deviation  • (3) Dividestandard deviation by mean  CV =

  42. Coefficient of Variation – Review • (1) Calculate mean  • (2) Calculatestandard deviation  • (3) Dividestandard deviation by mean  CV =

  43. Histograms – Review • We may also summarize our data by constructing histograms, which are vertical bar graphs • A histogram is used to graphically summarize the distribution of a data set • A histogram divides the range of values in a data set into intervals • Over each interval is placed a bar whose height represents the percentage of data values in the interval.

  44. Building a Histogram – Review • (1)Develop an ungrouped frequency table  11, 11, 11, 14, 18, 19, 22, 25, 25, 27, 27, 29, 30, 33 

  45. Building a Histogram – Review • 2. Construct a grouped frequency table  Select a set of classes 

  46. Building a Histogram – Review • 3. Plot the frequencies of each class

  47. We can also use a box plot to graphically summarize a data set A box plot represents a graphical summary of what is sometimes called a “five-number summary” of the distribution Minimum Maximum 25th percentile 75th percentile Median Interquartile Range (IQR) 75th %-ile max. median 25th %-ile min. Rogerson, p. 8. Box Plots – Review

  48. Boxplot – Review

  49. Source: Earickson, RJ, and Harlin, JM. 1994. Geographic Measurement and Quantitative Analysis. USA: Macmillan College Publishing Co., p. 91. Further Moments of the Distribution • While measures of dispersion are useful for helping us describe the width of the distribution, they tell us nothing about the shape of the distribution

More Related