1 / 50

Prepared by Lloyd R. Jaisingh

A PowerPoint Presentation Package to Accompany. Applied Statistics in Business & Economics, 4 th edition David P. Doane and Lori E. Seward. Prepared by Lloyd R. Jaisingh . Chapter Contents 4.1 Numerical Description 4.2 Measures of Center 4.3 Measures of Variability 4.4 Standardized Data

misu
Download Presentation

Prepared by Lloyd R. Jaisingh

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A PowerPoint Presentation Package to Accompany Applied Statistics in Business & Economics, 4th edition David P. Doane and Lori E. Seward Prepared by Lloyd R. Jaisingh

  2. Chapter Contents 4.1 Numerical Description 4.2 Measures of Center 4.3 Measures of Variability 4.4 Standardized Data 4.5 Percentiles, Quartiles, and Box Plots 4.6 Correlation and Covariance 4.7 Grouped Data 4.8 Skewness and Kurtosis Descriptive Statistics Chapter 4

  3. Chapter Learning Objectives LO4-1:Explain the concepts of center, variability, and shape. LO4-2:Use Excel to obtain descriptive statistics and visual displays. LO4-3:Calculate and interpret common measures of center. LO4-4:Calculate and interpret common measures of variability. LO4-5: Transform a data set into standardized values. LO4-6:Apply the Empirical Rule and recognize outliers. Descriptive Statistics Chapter 4

  4. Chapter Learning Objectives LO4-7:Calculate quartiles and other percentiles. LO4-8:Make and interpret box plots. LO4-9:Calculate and interpret a correlation coefficient and covariance. LO4-10:Calculate the mean and standard deviation from grouped data. LO4-11:Assess skewness and kurtosis in a sample. Descriptive Statistics Chapter 4

  5. 4.1 Numerical Description LO4-1 Chapter 4 LO4-1:Explain the concepts of center, variability, and shape. Three key characteristics of numerical data:

  6. 4.1 Numerical Description LO4-2 Chapter 4 LO4-2:Use Excel to obtain descriptive statistics and visual displays. EXCEL Histogram Display for Tables 4.3

  7. 4.2 Measures of Center LO4-3 Chapter 4 LO4-3:Calculate and interpret common measures of center. Mean • A familiar measure of center • In Excel, use function =AVERAGE(Data) where Data is an array of data values.

  8. 4.2 Measures of Center LO4-3 Chapter 4 Median • The median (M) is the 50th percentile or midpoint of the sorted sample data. • M separates the upper and lower halves of the sorted observations. • If n is odd, the median is the middle observation in the data array. • If n is even, the median is the average of the middle two observations in the data array.

  9. 4.2 Measures of Center LO4-3 Chapter 4 Mode • The most frequently occurring data value. • May have multiple modes or no mode. • The mode is most useful for discrete or categorical data with only a • few distinct data values. For continuous data or data with a wide • range, the mode is rarely useful.

  10. 4.2 Measures of Center LO4-1 Chapter 4 LO4-1:Explain the concepts of center, variability, and shape. Shape • Compare mean and median or look at the histogram to determine degree of skewness. • Figure 4.10 shows prototype population shapes showing varying degrees of skewness.

  11. 4.2 Measures of Center LO4-3 Chapter 4 Geometric Mean • The geometric mean (G) is a multiplicative average. Growth Rates A variation on the geometric mean used to find the average growth rate for a time series.

  12. 4.2 Measures of Center LO4-3 Chapter 4 Growth Rates • For example, from 2006 to 2010, JetBlue Airlines revenues are: The average growth rate: or 12.5 % per year.

  13. 4.2 Measures of Center LO4-3 Chapter 4 Midrange • The midrange is the point halfway between the lowest and highest values of X. • Easy to use but sensitive to extreme data values. • For the J.D. Power quality data: • Here, the midrange (126.5) is higher than the mean (114.70) or median (113).

  14. 4.2 Measures of Center LO4-3 Chapter 4 Trimmed Mean • To calculate the trimmed mean, first remove the highest and lowest k percent of the observations. • For example, for the n = 33 P/E ratios, we want a 5 percent trimmed mean (i.e., k = .05). • To determine how many observations to trim, multiply k by n, which is 0.05 x 33 = 1.65 or 2 observations. • So, we would remove the two smallest and two largest observations before averaging the remaining values.

  15. 4.2 Measures of Center LO4-3 Chapter 4 Trimmed Mean • Here is a summary of all the measures of central tendency for the J.D. Power data. • The trimmed mean mitigates the effects of very high values, but still exceeds the median.

  16. 4.3 Measures of Variability LO4-4 Chapter 4 LO4-4: Calculate and interpret common measures of variability. • Variation is the “spread” of data points about the center of the distribution in a sample. Consider the following measures of variability: Measures of Variability

  17. 4.3 Measures of Variability LO4-4 Chapter 4 Measures of Variation

  18. 4.3 Measures of Variability LO4-4 Chapter 4 Measures of Variability Population variance Population standard deviation

  19. 4.3 Measures of Variability LO4-4 Chapter 4 Coefficient of Variation • Useful for comparing variables measured in different units or with different means. • A unit-free measure of dispersion. • Expressed as a percent of the mean. • Only appropriate for nonnegative data. It is undefined if the mean is zero or negative.

  20. 4.3 Measures of Variability LO4-4 Chapter 4 Mean Absolute Deviation • This statistic reveals the average distance from the center. • Absolute values must be used since otherwise the deviations around the mean would sum to zero. It is stated in the unit of measurement. • The MAD is appealing because of its simple interpretation.

  21. 4.3 Measures of Variability LO4-1 Chapter 4 Central Tendency vs. Dispersion: Manufacturing • Take frequent samples to monitor quality.

  22. 4.4 Standardized Data Chapter 4 Chebyshev’s Theorem • For any population with mean m and standard deviation s, the percentage of observations that lie within k standard deviations of the mean must be at least 100[1 – 1/k2]. • For k = 2 standard deviations, 100[1 – 1/22] = 75% • So, at least 75.0% will lie within m+ 2s • Although applicable to any data set, these limits tend to be rather wide. • For k = 3 standard deviations, 100[1 – 1/32] = 88.9% • So, at least 88.9% will lie within m+ 3s

  23. 4.4 Standardized Data Chapter 4 The Empirical Rule • The normal distribution is symmetric and is also known as the • bell-shaped curve. • The Empirical Rule states that for data from a normal distribution, • we expect the interval  ± k to contain a known percentage • of data. For k = 1, 68.26% will lie within m+ 1s k = 2, 95.44% will lie within m+ 2s k = 3, 99.73% will lie within m+ 3s

  24. 4.4 Standardized Data Chapter 4 The Empirical Rule Note:No upper bound is given. Data values outside m+ 3sare rare.

  25. 4.4 Standardized Data LO4-5 Chapter 4 LO4-5: Transform a data set into standardized values. • A standardized variable(Z) redefines each observation in terms of the number of standard deviations from the mean. A negative z value means the observation is to the left of the mean. Standardization formula for a population: Standardization formula for a sample (for n > 30): Positive z means the observation is to the right of the mean.

  26. 4.4 Standardized Data LO4-6 Chapter 4 LO4-6: Apply the Empirical Rule and recognize outliers.

  27. 4.4 Standardized Data Chapter 4 Estimating Sigma • For a normal distribution, the range of values is almost 6s (from m – 3s to m + 3s). • If you know the range R (high – low), you can estimate the standard deviation as s = R/6. • Useful for approximating the standard deviation when only R is known. • This estimate depends on the assumption of normality.

  28. 4.5 Percentiles, Quartiles, and Box-Plots LO4-7 Chapter 4 LO4-7: Calculate quartiles and other percentiles Percentiles • Percentilesare data that have been divided into 100 groups. • For example, you score in the 83rd percentile on a standardized test. That means that 83% of the test-takers scored below you. • Deciles are data that have been divided into 10 groups. • Quintiles are data that have been divided into 5 groups. • Quartiles are data that have been divided into 4 groups.

  29. 4.5 Percentiles, Quartiles, and Box Plots LO4-7 Chapter 4 Percentiles • Percentiles may be used to establish benchmarks for comparison purposes (e.g. health care, manufacturing, and banking industries use 5th, 25th, 50th, 75th and 90th percentiles). • Quartiles (25, 50, and 75 percent) are commonly used to assess financial performance and stock portfolios. • Percentiles can be used in employee merit evaluation and salary benchmarking.

  30. 4.5 Percentiles, Quartiles, and Box Plots LO4-7 Chapter 4 Quartiles • Quartiles are scale points that divide the sorted data into four groups of approximately equal size. • The three values that separate the four groups are called Q1, Q2, and Q3, respectively.

  31. 4.5 Percentiles, Quartiles, and Box Plots LO4-7 Chapter 4 Quartiles • The second quartile Q2 is the median, a measure of central tendency. • Q1 and Q3 measure dispersion since the interquartile rangeQ3 – Q1 measures the degree of spread in the middle 50 percent of data values.

  32. For first half of data, 50% above, 50% below Q1. For second half of data, 50% above, 50% below Q3. 4.5 Percentiles, Quartiles, and Box Plots LO4-7 Chapter 4 Quartiles – The method of medians • The first quartile Q1 is the median of the data values below Q2, and the third quartile Q3 is the median of the data values above Q2.

  33. 4.5 Percentiles, Quartiles, and Box Plots LO4-7 Chapter 4 Method of Medians • For small data sets, find quartiles using method of medians: Step 1: Sort the observations. Step 2: Find the median Q2. Step 3: Find the median of the data values that lie belowQ2. Step 4: Find the median of the data values that lie aboveQ2.

  34. 4.5 Percentiles, Quartiles, and Box Plots LO4-7 Chapter 4 Method of Medians Example:

  35. 4.5 Percentiles, Quartiles, and Box Plots LO4-7 Chapter 4 Example: P/E Ratios and Quartiles • So, to summarize: • These quartiles express central tendency and dispersion. What is the interquartile range?

  36. Xmin, Q1, Q2, Q3, Xmax 7 27 35.5 40.5 49 4.5 Percentiles, Quartiles, and Box Plots LO4-8 Chapter 4 LO4-8: Make and interpret box plots. • A useful tool of exploratory data analysis(EDA). • Also called a box-and-whisker plot. • Based on a five-number summary: Xmin, Q1, Q2, Q3, Xmax • Consider the five-number summary for the previous P/E ratios example:

  37. 4.5 Percentiles, Quartiles, and Box Plots LO4-8 Chapter 4 Box Plots • The box plot is displayed visually, like this. • A box plot shows variability and shape.

  38. 4.5 Percentiles, Quartiles, and Box Plots LO4-8 Chapter 4 Box Plots

  39. 4.5 Percentiles, Quartiles, and Box Plots LO4-8 Chapter 4 Box Plots: Fences and Unusual Data Values • Use quartiles to detect unusual data points by defining fences using the following formulas: • Values outside the inner fences are unusualwhile those outside the outer fences are outliers. Here is a visual illustrating the fences:

  40. 4.5 Percentiles, Quartiles, and Box Plots LO4-8 Chapter 4 Box Plots: Fences and Unusual Data Values • For example, consider the P/E ratio data: There is one outlier (170) that lies above the inner fence. There are no extreme outliers that exceed the outer fence.

  41. 4.5 Percentiles, Quartiles, and Box Plots LO4-8 Chapter 4 Box Plots: Fences and Unusual Data Values • Truncate the whisker at the fences and display unusual values and outliers as dots. Outlier • Based on these fences, there is only one outlier.

  42. 4.5 Percentiles, Quartiles, and Box Plots LO4-8 Chapter 4 Box Plots: Midhinge • The average of the first and third quartiles. • The name midhinge derives from the idea that, if the “box” were folded in half, it would resemble a “hinge”.

  43. 4.6 Correlation and Covariance LO4-9 Chapter 4 LO4-9:Calculate and interpret a correlation coefficient and covariance. Correlation Coefficient • The sample correlation coefficient is a statistic that describes the degree of linearity between paired observations on two quantitative variables X and Y. Note: -1 ≤ r ≤ +1.

  44. 4.6 Correlation and Covariance LO4-9 Chapter 4 Correlation Coefficient • Illustration of Correlation Coefficients

  45. 4.6 Correlation and Covariance LO4-9 Chapter 4 Covariance The covariance of two random variables X and Y (denoted σXY )measures the degree to which the values of X and Y change together.

  46. 4.6 Correlation and Covariance LO LO4-9 Chapter 4 Covariance A correlation coefficient is the covariance divided by the product of the standard deviations of X and Y.

  47. 4.7 Grouped Data LO4-10 Chapter 4 LO4-10:Calculate the mean and standard deviation from grouped data. Weighted Mean Group Mean and Standard Deviation

  48. 4.7 Grouped Data LO4-10 Chapter 4 Group Mean and Standard Deviation

  49. 4.8 Skewness and Kurtosis LO4-11 Chapter 4 LO4-11:Assess skewness and kurtosis in a sample. Skewness

  50. 4.8 Skewness and Kurtosis LO4-11 Chapter 4 LO4-11:Assess skewness and kurtosis in a sample. Kurtosis

More Related