590 likes | 934 Views
4A. Chapter. Descriptive Statistics (Part 1). Numerical Description Central Tendency Dispersion. McGraw-Hill/Irwin. © 2008 The McGraw-Hill Companies, Inc. All rights reserved. Numerical Description. Statistics are descriptive measures derived from a sample ( n items).
E N D
4A Chapter Descriptive Statistics (Part 1) Numerical Description Central Tendency Dispersion McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, Inc. All rights reserved.
Numerical Description • Statistics are descriptive measures derived from a sample (n items). • Parameters are descriptive measures derived from a population (N items).
Numerical Description • Three key characteristics of numerical data:
Consider the data set of vehicle defect rates from J. D. Power and Associates. • Defect rate = total no. defects x 100 no. inspected Numerical Description • Example: Vehicle Quality • Numerical statistics can be used to summarize this random sample of brands. • Must allow for sampling error since the analysis is based on sampling.
Numerical Description • Number of defects per 100 vehicles, 1004 models.
Numerical Description • Sorted data provides insight into central tendency and dispersion.
Numerical Description • Visual Displays • The dot plot offers a visual impression of the data.
Numerical Description • Visual Displays • Histograms with 5 bins (suggested by Sturge’s Rule) and 10 bins are shown below. • Both are symmetric with no extreme values and show a modal class toward the low end.
Descriptive Statistics in Excel Go to Tools | Data Analysis and select Descriptive Statistics
Highlight the data range, specify a cell for the upper-left corner of the output range, check Summary Statistics and click OK.
Central Tendency • The central tendency is the middle or typical values of a distribution. • Central tendency can be assessed using a dot plot, histogram or more precisely with numerical statistics.
Central Tendency • Six Measures of Central Tendency
Central Tendency • Six Measures of Central Tendency
Central Tendency • Six Measures of Central Tendency
Central Tendency • Mean • A familiar measure of central tendency. • In Excel, use function =AVERAGE(Data) where Data is an array of data values.
Central Tendency • Mean • For the sample of n = 37 car brands:
Central Tendency • Characteristics of the Mean • Arithmetic mean is the most familiar average. • Affected by every sample item. • The balancing point or fulcrum for the data.
= (42 – 65) + (60 – 65) + (70 – 65) + (75 – 65) + (78 – 65) = (-23) + (-5) + (5) + (10) + (13) = -28 + 28 = 0 Central Tendency • Characteristics of the Mean • Regardless of the shape of the distribution, absolute distances from the mean to the data points always sum to zero. • Consider the following asymmetric distribution of quiz scores whose mean = 65.
Central Tendency • Median • The median (M) is the 50th percentile or midpoint of the sorted sample data. • M separates the upper and lower half of the sorted observations. • If n is odd, the median is the middle observation in the data array. • If n is even, the median is the average of the middle two observations in the data array.
For n = 8, the median is between the fourth and fifth observations in the data array. Central Tendency • Median
For n = 9, the median is the fifth observation in the data array. Central Tendency • Median
For even n, Median = Central Tendency • Median • Consider the following n = 6 data values:11 12 15 17 21 32 • What is the median? n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4 M = (x3+x4)/2 = (15+17)/2 = 16 11 12 15 16 17 21 32
For odd n, Median = Central Tendency • Median • Consider the following n = 7 data values:12 23 23 25 27 34 41 • What is the median? (n+1)/2 = (7+1)/2 = 8/2 = 4 M = x4 = 25 12 23 23 25 27 34 41
Central Tendency • Median • Use Excel’s function =MEDIAN(Data) where Data is an array of data values. • For the 37 vehicle quality ratings (odd n) the position of the median is (n+1)/2 = (37+1)/2 = 19. • So, the median is x19 = 121. • When there are several duplicate data values, the median does not provide a clean “50-50” split in the data.
Central Tendency • Characteristics of the Median • The median is insensitive to extreme data values. • For example, consider the following quiz scores for 3 students: Tom’s scores: 20, 40, 70, 75, 80 Mean =57, Median = 70, Total = 285 Jake’s scores: 60, 65, 70, 90, 95 Mean = 76, Median = 70, Total = 380 Mary’s scores: 50, 65, 70, 75, 90 Mean = 70, Median = 70, Total = 350 • What does the median for each student tell you?
Central Tendency • Mode • The most frequently occurring data value. • Similar to mean and median if data values occur often near the center of sorted data. • May have multiple modes or no mode.
Central Tendency • Mode • For example, consider the following quiz scores for 3 students: Lee’s scores: 60, 70, 70, 70, 80 Mean =70, Median = 70, Mode = 70 Pat’s scores: 45, 45, 70, 90, 100 Mean = 70, Median = 70, Mode = 45 Sam’s scores: 50, 60, 70, 80, 90 Mean = 70, Median = 70, Mode = none Xiao’s scores: 50, 50, 70, 90, 90 Mean = 70, Median = 70, Modes = 50,90 • What does the mode for each student tell you?
Central Tendency • Mode • Easy to define, not easy to calculate in large samples. • Use Excel’s function =MODE(Array)- will return #N/A if there is no mode.- will return first mode found if multimodal. • May be far from the middle of the distribution and not at all typical.
Central Tendency • Mode • Generally isn’t useful for continuous data since data values rarely repeat. • Best for attribute data or a discrete variable with a small range (e.g., Likert scale).
Central Tendency • Example: Price/Earnings Ratios and Mode • Consider the following P/E ratios for a random sample of 68 Standard & Poor’s 500 stocks. • What is the mode?
Central Tendency • Example: Price/Earnings Ratios and Mode • Excel’s descriptive statistics results are: • The mode 13 occurs 7 times, but what does the dot plot show?
Central Tendency • Example: Rose Bowl Winners’ Points • Points scored by the winning NCAA football team tends to have modes in multiples of 7 because each touchdown yields 7 points. • Consider the dot plot of the points scored by the winning team in the first 87 Rose Bowl games. • What is the mode?
Central Tendency • Skewness • Compare mean and median or look at histogram to determine degree of skewness.
Central Tendency • Symptoms of Skewness
Midrange = Midrange = = Central Tendency • Midrange • The midrange is the point halfway between the lowest and highest values of X. • Easy to use but sensitive to extreme data values. • For the J. D. Power quality data (n=37): • Here, the midrange (130) is higher than the mean (125.38) or median (121).
Dispersion • Variation is the “spread” of data points about the center of the distribution in a sample. Consider the following measures of dispersion: • Measures of Variation
Dispersion • Measures of Variation
Dispersion • Measures of Variation
Dispersion • Range • The difference between the largest and smallest observation. Range = xmax – xmin • For example, for the n = 68 P/E ratios, Range = 91 – 7 = 84
Dispersion • Variance • The population variance (s2) is defined as the sum of squared deviations around the mean m divided by the population size. • For the sample variance (s2), we divide by n – 1 instead of n, otherwise s2 would tend to underestimate the unknown population variance s2.
Population standard deviation Sample standard deviation Dispersion • Standard Deviation • The square root of the variance. • Explains how individual values in a data set vary from the mean. • Units of measure are the same as X.
Dispersion • Standard Deviation • Excel’s built in functions are
Dispersion • Calculating a Standard Deviation • Consider the following five quiz scores for Stephanie.
Dispersion • Calculating a Standard Deviation • Now, calculate the sample standard deviation: • Somewhat easier, the two-sum formula can also be used:
Dispersion • Calculating a Standard Deviation • The standard deviation is nonnegative because deviations around the mean are squared. • When every observation is exactly equal to the mean, the standard deviation is zero. • Standard deviations can be large or small, depending on the units of measure. • Compare standard deviations only for data sets measured in the same units and only if the means do not differ substantially.