290 likes | 453 Views
Basic Practice of Statistics. Chapter 2 Numerical Descriptive Measures. Chapter Topics. Measures of central tendency Mean, median, mode, geometric mean, midrange Quartile Measure of variation Range, Interquartile range, variance and standard deviation, coefficient of variation Shape
E N D
Basic Practice of Statistics Chapter 2 Numerical Descriptive Measures
Chapter Topics • Measures of central tendency • Mean, median, mode, geometric mean, midrange • Quartile • Measure of variation • Range, Interquartile range, variance and standard deviation, coefficient of variation • Shape • Symmetric, skewed, using box-and-whisker plots
Summary Measures Summary Measures Variation Central Tendency Quartile Mean Mode Coefficient of Variation Median Range Variance Standard Deviation
Measures of Central Tendency Central Tendency Average Median Mode
Mean (Arithmetic Mean) • Mean (arithmetic mean) of data values • Sample mean • Population mean Sample Size Population Size
Mean (Arithmetic Mean) (continued) • The most common measure of central tendency • Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14 Mean = 5 Mean = 6
Median • Robust measure of central tendency • Not affected by extreme values • In an ordered array, the median is the “middle” number • If n or N is odd, the median is the middle number • If n or N is even, the median is the average of the two middle numbers 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5 Median = 5
Mode • A measure of central tendency • Value that occurs most often • Not affected by extreme values • Used for either numerical or categorical data • There may may be no mode • There may be several modes 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 No Mode Mode = 9
Quartiles • Split Ordered Data into 4 Quarters • Position of i-th Quartile • and Are Measures of Noncentral Location • = Median, A Measure of Central Tendency 25% 25% 25% 25% Data in Ordered Array: 11 12 13 16 16 17 18 21 22
Measures of Variation Variation Variance Standard Deviation Coefficient of Variation Range Population Variance Population Standard Deviation Sample Variance Sample Standard Deviation Interquartile Range
Range • Measure of variation • Difference between the largest and the smallest observations: • Ignores the way in which data are distributed Range = 12 - 7 = 5 Range = 12 - 7 = 5 7 8 9 10 11 12 7 8 9 10 11 12
Interquartile Range • Measure of variation • Also known as midspread • Spread in the middle 50% • Difference between the first and third quartiles • Not affected by extreme values Data in Ordered Array: 11 12 13 16 16 17 17 18 21
Variance • Important measure of variation • Shows variation about the mean • Sample variance: • Population variance:
Standard Deviation • Most important measure of variation • Shows variation about the mean • Has the same units as the original data • Sample standard deviation: • Population standard deviation:
Comparing Standard Deviations Data A Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 Data B Mean = 15.5 s = .9258 11 12 13 14 15 16 17 18 19 20 21 Data C Mean = 15.5 s = 4.57 11 12 13 14 15 16 17 18 19 20 21
Using Standard Deviation Here are eight test scores from a previous class: 35, 59, 70, 73, 75, 81, 84, 86. The mean and standard deviation are 70.4 and 16.7, respectively. Work out which data points are within • one standard deviation from the mean i.e. 59, 70, 73, 75, 81, 84, 86 • two standard deviations from the mean i.e. 59, 70, 73, 75, 81, 84, 86 c) three standard deviations from the mean i.e. 35, 59, 70, 73, 75, 81, 84, 86
Using Standard Deviation • The example suggests that there may be a general rule which allows us to estimate the fraction of data points which are within a given number of standard deviations of the mean.
Interpreting the Standard Deviation Chebyshev’s Theorem The proportion (or fraction) of any data set lying within K standard deviations of the mean is always at least 1-1/K2, where K is any positive number greater than 1. For K=2 we obtain, at least 3/4 (75 %) of all scores will fall within 2 standard deviations of the mean, i.e. 75% of the data will fall between
Interpreting the Standard Deviation For K=3 we obtain, at least 8/9 (89 %) of all scores will fall within 3 standard deviations of the mean, i.e. 89% of the data will fall between
The Empirical Rule The Empirical Rule states that for bell shaped (normal) data: 68% of all data points are within 1 standard deviations of the mean 95% of all data points are within 2 standard deviations of the mean 99.7% of all data points are within 3 standard deviations of the mean
Coefficient of Variation • Measures relative variation • Always in percentage (%) • Shows variation relative to mean • Is used to compare two or more sets of data measured in different units
Comparing Coefficient of Variation • Stock A: • Average price last year = $50 • Standard deviation = $5 • Stock B: • Average price last year = $100 • Standard deviation = $5 • Coefficient of variation: • Stock A: • Stock B:
Shape of a Distribution • Describes how data is distributed • Measures of shape • Symmetric or skewed Right-Skewed Left-Skewed Symmetric Mean < Median < Mode Mean = Median =Mode Mode <Median < Mean
Exploratory Data Analysis • Box-and-whisker plot • Graphical display of data using 5-number summary Median( ) X X largest smallest 12 4 6 8 10
Distribution Shape and Box-and-Whisker Plot Left-Skewed Symmetric Right-Skewed
Chapter Summary • Described measures of central tendency • Mean, median, mode, geometric mean, midrange • Discussed quartile • Described measure of variation • Range, interquartile range, variance and standard deviation, coefficient of variation • Chebyshev’s & Empirical Rules • Illustrated shape of distribution • Symmetric, skewed, box-and-whisker plots