1 / 22

Topic-4

Topic-4. Measures of Dispersion and Skewness. Measuring Dispersion and Skewness. Dispersion : How a data set is spread (narrowly or widely) around its central tendency (mean) - which is important to many business applications.

cayla
Download Presentation

Topic-4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic-4 Measures of Dispersion and Skewness

  2. Measuring Dispersion and Skewness • Dispersion: How a data set is spread (narrowly or widely) around its central tendency (mean) - which is important to many business applications. • Range: the distance between the highest and lowest values in a data set, the smaller the range, the less dispersion of the data. (Only two values are considered - missing information) • Mean (Absolute) Deviation: (MAD) the mean of "absolute" deviations of all values from the mean - it considers all values, but the "absolute" values are hard to work with. • Variance (Population): the mean of the squared deviations, as a measure of dispersion, the larger the variance, the more spread of the data set, but sometimes the variance is difficult to interpret. • Standard Deviation: the positive square root of the variance – so the same unit of measurement in original data is presented for easy interpretation.

  3. Measuring Dispersion for Sample Data • For a sample data set: • Sample Variance: is the estimator of population variance (from a sample). To avoid being biased due to the loss of one "freedom of dimension". • [Note: the formula is slightly changed with (n-1) replacing (n)]. • Sample Standard Deviation: is the estimator of population standard deviation (from a sample) - is the square root of the sample variance. • For a data set grouped into a frequency distribution: • Range: the distance between the lowest limit of the smallest group and the highest limit of the highest group. • Standard Deviation: can be approximated by substituting the "frequency total" and the "frequency square total" for original "item total" and "item square total" in the formula.

  4. Interpreting Standard Deviation • Standard Deviation (SD) has been used as a major tool to measure and compare the spread of two or more data sets. • Chebyshev's Theorem: developed to determine, for any set of data (sample or population), the minimum proportion of the values that must lie within a specified number (k) of SD - regardless of the shape of distribution (1 - 1/K2, K > 1). • For measuring data dispersion purpose, the SD allow us to know how many values within the set will fall between where around the mean. • Empirical Rule: For symmetrical bell-shaped (Normal) distribution, more precisely, in terms of data dispersion, it can tell that: • roughly 68% will lie within + 1σ (SD) from the mean (μ). • roughly 98% will lie within + 2σ (SD) from the mean (μ). • roughly 99.7% will lie within + 3σ (SD) from the mean (μ). • (It is also called "Normal Roll" for "Normal Distribution".)

  5. Other Measures of Dispersion • Three other measures of dispersion are summarized below: • Interquartile Range: The distance between the 3rd and the 1st quartile [(Q3 – Q1)]. • Q1: the first quartile - the value corresponding to the point below which 25% of data lie under. • Q3: the third quartile - the value corresponding to the point above which 25% of data stand ahead. • The larger this range, the more spread of the data set. • Quartile Deviation: Half of the distance between the 3rd and the 1st quartile [(Q3 – Q1)/2 ]. Both Q3 and Q1 can be approximated from a cumulative frequency polygon. • Percentile Range: For any data set, the percentiles divide the distribution into 100 proportions. • Like Interquartile Range, the (10-to-90) Percentile Range is the distance between the 10th and 90th percentiles. • The Percentile range can be interpreted similar to Quartile Range - the large the range, the more spread of the data set. • Box Plot: A graphical display based on "quartiles" to picture the data set, only need 5 pieces of data: • [Minimum, Q^, Q_, Q` and Maximum]

  6. Relative Dispersion and Skewness • In order to compare two or more data sets in terms of dispersion (e.g., Standard Deviation) when the data sets are in different measurement of units (time vs. money), or their means are far apart in quantity, a relative measure (percent—%) is needed, as the "Coefficient of Variation" (CV). • Coefficient of Variation (CV): the ratio of the standard deviation to the arithmetic mean, as a percent. • The higher the ratio (in percent), the more dispersion of the data set relative to the mean. • Measuring Skewness: The degree of skewness of a distribution, when it is asymmetrical (either positively or negatively skewed), can be measured by "Coefficient of Skewness" (SK): • SK = 3(Mean – median)/SD [ -3 < SK < +3 ] • The sign of SK indicates a “positive” or “negative” skewness. • The large the absolute value of SK, the strong skewness within the data. • Summary: Two important data measurements have been discussed: • Central Tendency: Mean, Median, Mode (population/sample). • Data Dispersion and Skewness: Range, Mean Deviation, Variance, Standard Deviation, Normal Rule, Quartile and Percentile Range (population and sample).

  7. Mean Deviation • Mean Deviation: The arithmetic mean of the absolute values of the deviations from the arithmetic mean. It is computed using the following formula: • Where: • X is the Individual Value • is the Arithmetic Mean • n is the Sample Size

  8. Example 1 • The weights of a sample of crates ready for shipment to France are (in kg): 103, 97, 101, 106, and 103.

  9. Population Variance • Population Variance: The population variance for ungrouped data is the arithmetic mean of the squared deviations from the population mean. It is computed using the following formula: • Where: • X is the Individual Value • μ is the Population Mean • N is the Population Size • σ is Sigma square

  10. Example 2 • The ages of all the patients in the isolation ward of Yellowstone Hospital are 38, 26, 13, 41, and 22 years. What is the population variance? The computations are given below:

  11. Sample Variance • Sample Variance: The formula for the sample variance for ungrouped data is: • Where s is the Sample Variance • The sample variance is used to estimate the population variance.

  12. Example 3 • A sample of five hourly wages for blue-collar jobs is: 17, 26, 18, 20, and 19. Find the variance.

  13. Sample Variance forGrouped Data • Sample Variance: The formula for the sample for grouped data used as an estimator of the population variance is: • Where: • f is the class frequency • X is the class midpoint • n is the sample size • s2 is the sample variance

  14. Example 4 • A study of the absentee records at Knitt Manufacturing revealed the following number of days absent last year for a sample of sixty employees. Find the sample variance. The table below shows the relevant information.

  15. Example 4 • Sample Variance • The sample standard deviation, s, is the square root of the sample variance. Thus,

  16. Interquartile Range • Interquartile range: The distance between the 3rd quartile, Q3, and the 1st quartile, Q1. • Interquartile range = 3rd quartile – 1st quartile = Q3 – Q1 • First Quartile: This is the value corresponding to the point below which 25% of the observations lie in an ordered data set. For grouped data: • Third Quartile: This is the value corresponding to the point below which 75% of the observations lie in an ordered data set. For grouped data: • Where: • n is the sample size • L is the lower limit of the class containing Q1 or Q3 • CF is the cumulative frequency of the preceding class containing Q1 or Q3 • f is the frequency of the class containing Q1 or Q3 • i is the size of the class containing Q1 or Q3

  17. Percentile Range • Percentiles: Each data set has 99 percentiles, thus dividing the set into 100 equal parts. Note: in order to determine percentiles, you must order the set first. • Percentile Range: The 10-to-90 percentile range is the distance between the 10th and 90th percentiles.

  18. 10-to-90 Percentile Range

  19. Formulas for10th & 90th Percentiles

  20. Example 5 Interquartile Range = 10 – 5.75 = 4.25 10-to-90 Percentile Range = 8.375 – 4.000 = 4.375

  21. Exercise 4A • The research analyst for the Sidde Financial stock brokerage firm wants to compare the dispersion in the price-earnings ratios for a group of common stocks with the dispersion of their return on investment. For the price-earnings ratios, the mean is 10.9 and the standard deviation is 1.8. The mean return on investment is 25 percent and the standard deviation is 5.2 percent. • Why should the coefficient of variation be used to compare the dispersion? • Compare the relative dispersion for the price-earnings ratios and return on investment.

  22. Exercise 4B • A sample of homes currently offered for sale in Walla Walla, Washington, revealed that the mean asking price is $75,900, the median is $70,100, and the modal price is $67,200. The standard deviation of the distribution is $5,900. • Is the distribution of prices symmetrical, negatively skewed, or positively skewed? • What is the coefficient of skewness? Interpret.

More Related