1 / 48

Summarizing Data

Summarizing Data. Graphical Methods. Histogram. Grouped Freq Table. Stem-Leaf Diagram. Box-whisker Plot. Summary Numerical Measures. Measure of Central Location. Mean Median. Measure of Non-Central Location. Percentiles Quartiles

ottowalker
Download Presentation

Summarizing Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summarizing Data Graphical Methods

  2. Histogram Grouped Freq Table Stem-Leaf Diagram Box-whisker Plot

  3. SummaryNumerical Measures

  4. Measure of Central Location • Mean • Median

  5. Measure of Non-Central Location • Percentiles • Quartiles • Lower quartile (Q1) (25th percentile) (lower mid-hinge) • median (Q2) (50th percentile) (hinge) • Upper quartile (Q3) (75th percentile) (upper mid-hinge)

  6. Measure of Variability (Dispersion, Spread) • Range • Inter-Quartile Range • Variance, standard deviation • Pseudo-standard deviation

  7. Range R = Range = max - min • Inter-Quartile Range (IQR) Inter-Quartile Range = IQR = Q3 - Q1

  8. The Sample Variance Is defined as the quantity: and is denoted by the symbol

  9. The Sample Standard Deviation s Definition: The Sample Standard Deviation is defined by: Hence the Sample Standard Deviation, s, is the square root of the sample variance.

  10. Interpretations of s • In Normal distributions • Approximately 2/3 of the observations will lie within one standard deviation of the mean • Approximately 95% of the observations lie within two standard deviations of the mean • In a histogram of the Normal distribution, the standard deviation is approximately the distance from the mode to the inflection point

  11. Mode Inflection point s

  12. 2/3 s s

  13. 2s

  14. Computing formulae for s and s2 The sum of squares of deviations from the the meancan also be computed using the following identity:

  15. Then:

  16. A quick (rough) calculation of s The reason for this is that approximately all (95%) of the observations are between and Thus

  17. The Pseudo Standard Deviation (PSD) Definition: The Pseudo Standard Deviation (PSD)is defined by:

  18. Properties • For Normal distributions the magnitude of the pseudo standard deviation (PSD) and the standard deviation (s) will be approximately the same value • For leptokurtic distributions the standard deviation (s) will be larger than the pseudo standard deviation (PSD) • For platykurtic distributions the standard deviation (s) will be smaller than the pseudo standard deviation (PSD)

  19. Measures of Shape

  20. Measures of Shape • Skewness • Kurtosis

  21. Skewness – based on the sum of cubes • Kurtosis – based on the sum of 4th powers

  22. The Measure of Skewness

  23. The Measure of Kurtosis

  24. Interpretations of Measures of Shape • Skewness • Kurtosis g1 > 0 g1 = 0 g1 < 0 g2 < 0 g2 = 0 g2 > 0

  25. Example • The Baby Boom

  26. Age Distribution for Canada 1921 - 2006

  27. Median Age in Canada by Gender and Year

  28. Total Population in Canada by Year

  29. The Globe and Mail Report Baby Boom

  30. Multivariate Data

More Related