460 likes | 731 Views
Business Statistics: A Decision-Making Approach 7 th Edition. Chapter 3 Describing Data Using Numerical Measures. Chapter Goals. After completing this chapter, you should be able to: Compute and interpret the mean, median, and mode for a set of data
E N D
Business Statistics: A Decision-Making Approach 7th Edition Chapter 3Describing Data Using Numerical Measures
Chapter Goals After completing this chapter, you should be able to: • Compute and interpret the mean, median, and mode for a set of data • Compute the range, variance, and standard deviation and know what these values mean • Construct and interpret a box and whisker graph • Compute and explain the coefficient of variation and z scores • Use numerical measures along with graphs, charts, and tables to describe data
Chapter Topics • Measures of Center and Location • Mean, median, mode • Other measures of Location • Weighted mean, percentiles, quartiles • Measures of Variation • Range, interquartile range, variance and standard deviation, coefficient of variation • Using the mean and standard deviation together • Coefficient of variation, z-scores
Summary Measures Describing Data Numerically Center and Location Other Measures of Location Variation Mean Range Percentiles Median Interquartile Range Quartiles Mode Variance Weighted Mean Standard Deviation Coefficient of Variation
Measures of Center and Location Center and Location Mode Weighted Mean Mean Median
Mean (Average) (continued) • The most common measure of central tendency • Mean = sum of values divided by the number of values • Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 Mean = 4
Median • In an ordered array, the median is the “middle” number, i.e., the number that splits the distribution in half • The median is not affected by extreme values • Useful when data are highly skewed 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Median = 3 Median = 3
Meanis generally used, unless extreme values (outliers) exist Then Medianis often used, since the median is not sensitive to extreme values. Example: Realtors say “Median Home Prices”….not average – less sensitive to outliers Which measure of location is the “best”?
Shape of a Distribution based on Mean and Median • Describes how data is distributed • Symmetric or skewed Right-Skewed Symmetric Left-Skewed Mean < Median Mean = Median Median <Mean (Longer tail extends to left) (Longer tail extends to right)
Mode • The value that is repeated most often • Not affected by extreme values • Used for either numerical or categorical data • There may be no mode • There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 Mode = 5 No Mode
Five houses on a hill by the beach Implication House Prices: $2,000,000 500,000 300,000 100,000 100,000
Mean: ($3,000,000/5) = $600,000 Median: middle value of ranked data = $300,000 Mode: most frequent value = $100,000 Implication House Prices: $2,000,000 500,000 300,000 100,000 100,000 Sum 3,000,000
Practice • From the class website, download “describing numerical data using Excel.” • Try followings: • Mean vs. Median • Hockey • Shoes
Weighted Mean • Used when values are grouped by frequency or relative importance (i.e., GPA) Example: Sample of 26 Repair Projects Weighted Mean Days to Complete:
Other Location Measures • A percentile is a measure that tells us what percent of the total frequency scored at or below that measure. • SAT score 90th percentile: the score is as high as or higher than 90% of other students who took the SAT test. Other Measures of Location Percentiles Quartiles • When statistical data is ordered sequentially, the data can be divided into 4 groups of data. • Each of the groups of data is separated by a line called a Quartile. • Example 3-8 on pp. 99
Percentiles • Example 1:Find the 60th percentile in an ordered array of 19 values. So, use value in the i = 12th position If i is not an integer, round offto the next higher integer value. If i is an integer, the pth percentile is the average of the values in position i and i+1.
Quartiles • Quartiles split the ranked data into 4 equal groups: • Note that the second quartile (the 50th percentile) is the median 25% 25% 25% 25% Q1 Q2 Q3
Quartiles • Example: Find the first quartile (25th percentile) Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22 (n = 9) Q1 = 25th percentile, so find i : i = (9) = 2.25 so round up and use the value in the 3rd position: Q1 = 13 25 100
Practice • Use “describing numerical data using Excel” file, try following: • Salary
Box and Whisker Plot • Statistics assumes that your data points (the numbers in your list) are clustered around some central value. • The "box" in the box-and-whisker plot contains and highlights the middle half of data points. • It is useful for quickly finding outliers - data points out of line with the rest of the data set. • The Box and Whisker plot can be shown in either vertical or horizontal format. • Example 3-9 on pp.100
Box and Whisker Plot • Outliers are plotted outside the calculated limits and the center box extends from Q1 to Q3. 25% 25% 25% 25% * * Outliers Lower 1st Median 3rd Upper Limit Quartile Quartile Limit The lower limit is Q1 – 1.5 (Q3 – Q1) The upper limit is Q3 + 1.5 (Q3 – Q1)
Box-and-Whisker Plot Example • Below is a Box-and-Whisker plot for the following data: 0 2 2 2 3 3 4 5 6 11 27 • This data is right skewed, as the plot depicts Min Q1 Q2 Q3 Max * 0 2 3 6 12 27 Upper limit = Q3 + 1.5 (Q3 – Q1) = 6 + 1.5 (6 – 2) = 12 27 is above the upper limit so is shown as an outlier
Distribution Shape and Box and Whisker Plot Left-Skewed Symmetric Right-Skewed Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Practice • Using Free Online Tool • Using the data set on the website, develop a Box-and-Whisker Plot (free online tool).
Measures of Variation Variation Range Variance Standard Deviation Coefficient of Variation Population Standard Deviation Interquartile Range Population Variance Sample Variance Sample Standard Deviation
Measures of variation give information on the spread of the data values. Variation Same center, different variation
Range (refer to Measures of Variability) • Simplest measure of variation • Difference between the largest and the smallest observations: Range = xmaximum – xminimum Example: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13
Disadvantages of the Range • Ignores the way in which data are distributed • Sensitive to outliers (extreme values) 7 8 9 10 11 12 7 8 9 10 11 12 Range = 12 - 7 = 5 Range = 12 - 7 = 5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 Range = 5 - 1 = 4 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 120 - 1 = 119
Interquartile Range (refer to Measures of Variability) • Eliminate range’s susceptibility to extreme values • Can eliminate some outlier problems • Eliminate some high-and low-valued observations and calculate the range from the remaining values. • Interquartile range = 3rd quartile – 1stquartile • Example 3-10 on pp.108
Interquartile Range Example Example: Median (Q2) X X Q1 Q3 maximum minimum 25% 25% 25% 25% 12 30 45 57 70 Interquartile range = 57 – 30 = 27
Variance • Average of squared deviations of values from the mean • Populationvariance: • Samplevariance:
Standard Deviation • Most commonly used measure of variation • Shows variation about the mean • Has the same units as the original data • Populationstandard deviation: • Samplestandard deviation:
Comparing Standard Deviations Same mean, but different standard deviations: Data A Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 Data B Mean = 15.5 s = .9258 11 12 13 14 15 16 17 18 19 20 21 Data C Mean = 15.5 s = 4.57 11 12 13 14 15 16 17 18 19 20 21
Practice • Try the “Variance Vs. Standard Deviation” link on the class website • Use “describing numerical data using Excel” file, try following: • Std.
Coefficient of Variation • The coefficient of variation measures the relative variation for distributions with different means. • In finance, CV measures the relative risk of a stock portfolio • It is usually expressed as a percentage. PopulationSample
Comparing Coefficients of Variation • Stock A: • Average price last year = $50 • Standard deviation = $5 • Stock B: • Average price last year = $100 • Standard deviation = $5 Both stocks have the same standard deviation, but stock B is less variable relative to its price
The Empirical Rule • If the data distribution is bell-shaped, then the interval: • contains about 68% of the values in the population or the sample 68%
The Empirical Rule • contains about 95% of the values in the population or the sample • contains about 99.7% of the values in the population or the sample 95% 99.7%
Practice: Getting Descriptive Statistics Using Excel • Descriptive Statistics are easy to obtain from Microsoft Excel • Use menu choice:Data / data analysis / descriptive statistics • Enter details in dialog box
Open Excel and then enter the data • Select:Data / data analysis / descriptive statistics
Using Excel (continued) • Enter dialog box details • Check box for summary statistics • Click OK
Excel output Microsoft Excel descriptive statistics output, using the house price data: How to read “Skewness”? House Prices: $2,000,000 500,000 300,000 100,000 100,000
Chapter Summary • Described measures of center and location • Mean, median, mode, weighted mean • Discussed percentiles and quartiles • Created Box and Whisker Plots • Illustrated distribution shapes • Symmetric, skewed
Chapter Summary (continued) • Described measure of variation • Range, interquartile range, variance, standard deviation, coefficient of variation • Discussed Tchebysheff’s Theorem • Calculated standardized data values