1 / 31

Descriptive Statistics

. Descriptive Statistics. . %. x. Measures of Central Tendency Measures of Variability. Measures of Location. Mean Median Mode Percentiles Quartiles. Example: Apartment Rents. Given below is a sample of monthly rent values ($)

Download Presentation

Descriptive Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Descriptive Statistics  % x Measures of Central Tendency Measures of Variability

  2. Measures of Location • Mean • Median • Mode • Percentiles • Quartiles

  3. Example: Apartment Rents Given below is a sample of monthly rent values ($) for one-bedroom apartments. The data is a sample of 70 apartments in a particular city. The data are presented in ascending order.

  4. Mean • The mean of a data set is the average of all the data values. • If the data are from a sample, the mean is denoted by . • If the data are from a population, the mean is denoted by m (mu).

  5. Example: Apartment Rents • Mean

  6. Median • The median is the measure of location most often reported for annual income and property value data. • A few extremely large incomes or property values can inflate the mean. • The median is not as sensitive to extreme values.

  7. Median • The median of a data set is the value in the middle when the data items are arranged in ascending order. • For an odd number of observations, the median is the middle value. • For an even number of observations, the median is the average of the two middle values.

  8. Median and Mean • The median of the following numbers is simply the middle value: 32, 42, 46, 46, 54 which is 46. Because there are an odd number of data values there is an actual middle position. The position of the median value of n items is at (n+1)/2 For example, with n= 5, the median is at position (5+1)/2 = 3

  9. Another Example: Starting Salaries • Starting Monthly Salaries of 12 Graduates: 2710, 2755, 2850, 2880, 2880, 2890 2920, 2940, 2950, 3050, 3130, 3325 2905 • Here we have an even number of data values, with n=12. • The median position is (12+1)/2 = 6.5 • This is interpreted as the average of the 6th and 7th. Thus the median will be the average of 2890 and 2920 which is 2905. • The median value literally splits the data into two halves.

  10. Mode • The mode of a data set is the value that occurs with greatest frequency. • The greatest frequency can occur at two or more different values. • If the data have exactly two modes, the data are bimodal. • If the data have more than two modes, the data are multimodal.

  11. Example: Apartment Rents • Mode 450 occurred most frequently (7 times) Mode = 450

  12. Measures of Variability • It is often desirable to consider measures of variability (dispersion), as well as measures of location. • For example, in choosing supplier A or supplier B we might consider not only the average delivery time for each, but also the variability in delivery time for each.

  13. Measures of Variability • Range • Variance • Standard Deviation

  14. Range • The range of a data set is the difference between the largest and smallest data values*. (largest-smallest) • It is the simplest measure of variability. • It is very sensitive to the smallest and largest data values. • Note: alternate definition some people use is (largest-smallest)+1 This is the inclusive definition.

  15. Example: Apartment Rents • Range Range = largest value - smallest value +1 Range = 615 – 425+1 = 191

  16. Another Apartment Rent Example: • Suppose the highest rent were $1000 instead of $615 • The rage would then be 1000 – 425 = 575 instead of 190. In this case 69 of 70 rents are actually within a span of 190 even though the range is 575. • Thus the Range is very sensitive to extremes.

  17. Variance • The variance is a measure of variability that utilizes all the data. • It is based on the difference between the value of each observation (xi) and the mean (x for a sample, m for a population).

  18. Variance • The variance is the average of the squared differences between each data value and the mean. • If the data set is a sample, the variance is denoted by s2. • If the data set is a population, the variance is denoted by  2.

  19. Sample Variance • The reason for the n-1 in the denominator of sample variance is theoretical, but it involves the fact that by using (n-1) rather than n, the sample variance s2 is a better estimate of the population variance. Using n tends to underestimate the population variance. • Think of variance as a measure of how much the data values vary. The unit of the variance is the square of the units of the original data so giving a realistic interpretation to the unit is difficult.

  20. Standard Deviation • The standard deviation of a data set is the positive square root of the variance. • It is measured in the same units as the data, making it more easily comparable, than the variance, to the mean. • If the data set is a sample, the standard deviation is denoted by s where • If the data set is a population, the standard deviation is denoted  (sigma).

  21. Example: Apartment Rents • Variance • Standard Deviation

  22. Measures of Relative Locationand Detecting Outliers • z-Scores • Empirical Rule

  23. z-Scores • The z-score is often called the standardized value. • It denotes the number of standard deviations a data value xi is from the mean. • A data value less than the sample mean will have a z-score less than zero. • A data value greater than the sample mean will have a z-score greater than zero. • A data value equal to the sample mean will have a z-score of zero.

  24. Example: Apartment Rents • z-Score of Smallest Value (425) Standardized Values for Apartment Rents

  25. Empirical Rule For data having a bell-shaped distribution: • Approximately 68% of the data values will be within onestandard deviation of the mean.

  26. Empirical Rule For data having a bell-shaped distribution: • Approximately 95% of the data values will be within twostandard deviations of the mean.

  27. Empirical Rule For data having a bell-shaped distribution: • Almost all (99.7%) of the items will be within threestandard deviations of the mean.

  28. Example: Apartment Rents • Empirical Rule Interval% in Interval Within +/- 1s 436 .06 to 545.54 48/70 = 69% Within +/- 2s 381.32 to 600.28 68/70 = 97% Within +/- 3s 326.58 to 655.02 70/70 = 100%

  29. Detecting Outliers • An outlier is an unusually small or unusually large value in a data set. • A data value with a z-score less than -3 or greater than +3 might be considered an outlier. • It might be: • an incorrectly recorded data value • a data value that was incorrectly included in the data set • a correctly recorded data value that belongs in the data set

  30. Example: Apartment Rents • Detecting Outliers The most extreme z-scores are -1.20 and 2.27. Using |z| > 3 as the criterion for an outlier, there are no outliers in this data set. Standardized Values for Apartment Rents

  31. End of Chapter

More Related