480 likes | 651 Views
Summarizing Data. Graphical Methods. Histogram. Grouped Freq Table. Stem-Leaf Diagram. Box-whisker Plot. Summary Numerical Measures. Measure of Central Location. Mean Median. Measure of Non-Central Location. Percentiles Quartiles
E N D
Summarizing Data Graphical Methods
Histogram Grouped Freq Table Stem-Leaf Diagram Box-whisker Plot
Measure of Central Location • Mean • Median
Measure of Non-Central Location • Percentiles • Quartiles • Lower quartile (Q1) (25th percentile) (lower mid-hinge) • median (Q2) (50th percentile) (hinge) • Upper quartile (Q3) (75th percentile) (upper mid-hinge)
Measure of Variability (Dispersion, Spread) • Range • Inter-Quartile Range • Variance, standard deviation • Pseudo-standard deviation
Range R = Range = max - min • Inter-Quartile Range (IQR) Inter-Quartile Range = IQR = Q3 - Q1
The Sample Variance Is defined as the quantity: and is denoted by the symbol
The Sample Standard Deviation s Definition: The Sample Standard Deviation is defined by: Hence the Sample Standard Deviation, s, is the square root of the sample variance.
Interpretations of s • In Normal distributions • Approximately 2/3 of the observations will lie within one standard deviation of the mean • Approximately 95% of the observations lie within two standard deviations of the mean • In a histogram of the Normal distribution, the standard deviation is approximately the distance from the mode to the inflection point
Mode Inflection point s
2/3 s s
Computing formulae for s and s2 The sum of squares of deviations from the the meancan also be computed using the following identity:
A quick (rough) calculation of s The reason for this is that approximately all (95%) of the observations are between and Thus
The Pseudo Standard Deviation (PSD) Definition: The Pseudo Standard Deviation (PSD)is defined by:
Properties • For Normal distributions the magnitude of the pseudo standard deviation (PSD) and the standard deviation (s) will be approximately the same value • For leptokurtic distributions the standard deviation (s) will be larger than the pseudo standard deviation (PSD) • For platykurtic distributions the standard deviation (s) will be smaller than the pseudo standard deviation (PSD)
Measures of Shape • Skewness • Kurtosis
Skewness – based on the sum of cubes • Kurtosis – based on the sum of 4th powers
Interpretations of Measures of Shape • Skewness • Kurtosis g1 > 0 g1 = 0 g1 < 0 g2 < 0 g2 = 0 g2 > 0
Example • The Baby Boom
Age Distribution for Canada 1921 - 2006
The Globe and Mail Report Baby Boom