690 likes | 915 Views
4. Chapter. Descriptive Statistics (Part 1). Numerical Description Central Tendency Dispersion. Numerical Description. Statistics are descriptive measures derived from a sample ( n items). Parameters are descriptive measures derived from a population ( N items). Numerical Description.
E N D
4 Chapter Descriptive Statistics (Part 1) Numerical Description Central Tendency Dispersion
Numerical Description • Statistics are descriptive measures derived from a sample (n items). • Parameters are descriptive measures derived from a population (N items).
Numerical Description • Three key characteristics of numerical data:
Consider the data set of vehicle defect rates from J. D. Power and Associates. • Defect rate = total no. defects x 100 no. inspected Numerical Description • Example: Vehicle Quality • Numerical statistics can be used to summarize this random sample of brands. • Must allow for sampling error since the analysis is based on sampling.
Numerical Description • Number of defects per 100 vehicles, 1004 models.
Numerical Description • Sorted data provides insight into central tendency and dispersion.
Numerical Description • Visual Displays • The dot plot offers a visual impression of the data.
Numerical Description • Visual Displays • Histograms with 5 bins (suggested by Sturge’s Rule) and 10 bins are shown below. • Both are symmetric with no extreme values and show a modal class toward the low end.
Descriptive Statistics in Excel Go to Tools | Data Analysis and select Descriptive Statistics
Highlight the data range, specify a cell for the upper-left corner of the output range, check Summary Statistics and click OK.
Central Tendency • The central tendency is the middle or typical values of a distribution. • Central tendency can be assessed using a dot plot, histogram or more precisely with numerical statistics.
Central Tendency • Six Measures of Central Tendency
Central Tendency • Six Measures of Central Tendency
Central Tendency • Six Measures of Central Tendency
Central Tendency • Mean • A familiar measure of central tendency. • In Excel, use function =AVERAGE(Data) where Data is an array of data values.
Central Tendency • Mean • For the sample of n = 37 car brands:
Central Tendency • Characteristics of the Mean • Arithmetic mean is the most familiar average. • Affected by every sample item. • The balancing point or fulcrum for the data.
= (42 – 65) + (60 – 65) + (70 – 65) + (75 – 65) + (78 – 65) = (-23) + (-5) + (5) + (10) + (13) = -28 + 28 = 0 Central Tendency • Characteristics of the Mean • Regardless of the shape of the distribution, absolute distances from the mean to the data points always sum to zero. • Consider the following asymmetric distribution of quiz scores whose mean = 65.
Central Tendency • Median • The median (M) is the 50th percentile or midpoint of the sorted sample data. • M separates the upper and lower half of the sorted observations. • If n is odd, the median is the middle observation in the data array. • If n is even, the median is the average of the middle two observations in the data array.
For n = 8, the median is between the fourth and fifth observations in the data array. Central Tendency • Median
For n = 9, the median is the fifth observation in the data array. Central Tendency • Median
For even n, Median = Central Tendency • Median • Consider the following n = 6 data values:11 12 15 17 21 32 • What is the median? n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4 M = (x3+x4)/2 = (15+17)/2 = 16 11 12 15 16 17 21 32
For odd n, Median = Central Tendency • Median • Consider the following n = 7 data values:12 23 23 25 27 34 41 • What is the median? (n+1)/2 = (7+1)/2 = 8/2 = 4 M = x4 = 25 12 23 23 25 27 34 41
Central Tendency • Median • Use Excel’s function =MEDIAN(Data) where Data is an array of data values. • For the 37 vehicle quality ratings (odd n) the position of the median is (n+1)/2 = (37+1)/2 = 19. • So, the median is x19 = 121. • When there are several duplicate data values, the median does not provide a clean “50-50” split in the data.
Central Tendency • Characteristics of the Median • The median is insensitive to extreme data values. • For example, consider the following quiz scores for 3 students: Tom’s scores: 20, 40, 70, 75, 80 Mean =57, Median = 70, Total = 285 Jake’s scores: 60, 65, 70, 90, 95 Mean = 76, Median = 70, Total = 380 Mary’s scores: 50, 65, 70, 75, 90 Mean = 70, Median = 70, Total = 350 • What does the median for each student tell you?
Central Tendency • Mode • The most frequently occurring data value. • Similar to mean and median if data values occur often near the center of sorted data. • May have multiple modes or no mode.
Central Tendency • Mode • For example, consider the following quiz scores for 3 students: Lee’s scores: 60, 70, 70, 70, 80 Mean =70, Median = 70, Mode = 70 Pat’s scores: 45, 45, 70, 90, 100 Mean = 70, Median = 70, Mode = 45 Sam’s scores: 50, 60, 70, 80, 90 Mean = 70, Median = 70, Mode = none Xiao’s scores: 50, 50, 70, 90, 90 Mean = 70, Median = 70, Modes = 50,90 • What does the mode for each student tell you?
Central Tendency • Mode • Easy to define, not easy to calculate in large samples. • Use Excel’s function =MODE(Array)- will return #N/A if there is no mode.- will return first mode found if multimodal. • May be far from the middle of the distribution and not at all typical.
Central Tendency • Mode • Generally isn’t useful for continuous data since data values rarely repeat. • Best for attribute data or a discrete variable with a small range (e.g., Likert scale).
Central Tendency • Example: Price/Earnings Ratios and Mode • Consider the following P/E ratios for a random sample of 68 Standard & Poor’s 500 stocks. • What is the mode?
Central Tendency • Example: Price/Earnings Ratios and Mode • Excel’s descriptive statistics results are: • The mode 13 occurs 7 times, but what does the dot plot show?
Central Tendency • Example: Price/Earnings Ratios and Mode • The dot plot shows local modes (a peak with valleys on either side) at 10, 13, 15, 19, 23, 26, 29. • These multiple modes suggest that the mode is not a stable measure of central tendency.
Central Tendency • Example: Rose Bowl Winners’ Points • Points scored by the winning NCAA football team tends to have modes in multiples of 7 because each touchdown yields 7 points. • Consider the dot plot of the points scored by the winning team in the first 87 Rose Bowl games. • What is the mode?
Central Tendency • Mode • A bimodal distribution refers to the shape of the histogram rather than the mode of the raw data. • Occurs when dissimilar populations are combined in one sample. For example,
Central Tendency • Skewness • Compare mean and median or look at histogram to determine degree of skewness.
Central Tendency • Symptoms of Skewness
Central Tendency • Skewness • For the sample of J.D. Power quality ratings, the mean (125.38) exceeds the median (121). What does this suggest?
Central Tendency • Geometric Mean • The geometric mean (G) is a multiplicative average. • For the J. D. Power quality data (n=37): • In Excel use =GEOMEAN(Array) • The geometric mean tends to mitigate the effects of high outliers.
Central Tendency • Growth Rates • A variation on the geometric mean used to find the average growth rate for a time series. • For example, from 1998 to 2002, Spirit Airlines revenues are:
In Excel use =(403/131)^(1/5)-1 Central Tendency • Growth Rates • The average growth rate is given by taking the geometric mean of the ratios of each year’s revenue to the preceding year. • Due to cancellations, only the first and last years are relevant: = 1.2421 = .242 or 24.2% per year
Midrange = Midrange = = Central Tendency • Midrange • The midrange is the point halfway between the lowest and highest values of X. • Easy to use but sensitive to extreme data values. • For the J. D. Power quality data (n=37): • Here, the midrange (130) is higher than the mean (125.38) or median (121).
Central Tendency • Trimmed Mean • To calculate the trimmed mean, first remove the highest and lowest k percent of the observations. • For example, for the n = 68 P/E ratios, we want a 5 percent trimmed mean (i.e., k = .05). • To determine how many observations to trim, multiply k x n = 0.05 x 68 = 3.4 or 3 observations. • So, we would remove the three smallest and three largest observations before averaging the remaining values.
Central Tendency • Trimmed Mean • Here is a summary of all the measures of central tendency for the n = 68 P/E values. • The trimmed mean mitigates the effects of very high values, but still exceeds the median.
Central Tendency • Trimmed Mean • The Federal Reserve uses a 16% trimmed mean to mitigate the effects of extremes in its analysis of the Consumer Price Index.
Dispersion • Variation is the “spread” of data points about the center of the distribution in a sample. Consider the following measures of dispersion: • Measures of Variation