310 likes | 436 Views
. Descriptive Statistics. . %. x. Measures of Central Tendency Measures of Variability. Measures of Location. Mean Median Mode Percentiles Quartiles. Example: Apartment Rents. Given below is a sample of monthly rent values ($)
E N D
Descriptive Statistics % x Measures of Central Tendency Measures of Variability
Measures of Location • Mean • Median • Mode • Percentiles • Quartiles
Example: Apartment Rents Given below is a sample of monthly rent values ($) for one-bedroom apartments. The data is a sample of 70 apartments in a particular city. The data are presented in ascending order.
Mean • The mean of a data set is the average of all the data values. • If the data are from a sample, the mean is denoted by . • If the data are from a population, the mean is denoted by m (mu).
Example: Apartment Rents • Mean
Median • The median is the measure of location most often reported for annual income and property value data. • A few extremely large incomes or property values can inflate the mean. • The median is not as sensitive to extreme values.
Median • The median of a data set is the value in the middle when the data items are arranged in ascending order. • For an odd number of observations, the median is the middle value. • For an even number of observations, the median is the average of the two middle values.
Median and Mean • The median of the following numbers is simply the middle value: 32, 42, 46, 46, 54 which is 46. Because there are an odd number of data values there is an actual middle position. The position of the median value of n items is at (n+1)/2 For example, with n= 5, the median is at position (5+1)/2 = 3
Another Example: Starting Salaries • Starting Monthly Salaries of 12 Graduates: 2710, 2755, 2850, 2880, 2880, 2890 2920, 2940, 2950, 3050, 3130, 3325 2905 • Here we have an even number of data values, with n=12. • The median position is (12+1)/2 = 6.5 • This is interpreted as the average of the 6th and 7th. Thus the median will be the average of 2890 and 2920 which is 2905. • The median value literally splits the data into two halves.
Mode • The mode of a data set is the value that occurs with greatest frequency. • The greatest frequency can occur at two or more different values. • If the data have exactly two modes, the data are bimodal. • If the data have more than two modes, the data are multimodal.
Example: Apartment Rents • Mode 450 occurred most frequently (7 times) Mode = 450
Measures of Variability • It is often desirable to consider measures of variability (dispersion), as well as measures of location. • For example, in choosing supplier A or supplier B we might consider not only the average delivery time for each, but also the variability in delivery time for each.
Measures of Variability • Range • Variance • Standard Deviation
Range • The range of a data set is the difference between the largest and smallest data values*. (largest-smallest) • It is the simplest measure of variability. • It is very sensitive to the smallest and largest data values. • Note: alternate definition some people use is (largest-smallest)+1 This is the inclusive definition.
Example: Apartment Rents • Range Range = largest value - smallest value +1 Range = 615 – 425+1 = 191
Another Apartment Rent Example: • Suppose the highest rent were $1000 instead of $615 • The rage would then be 1000 – 425 = 575 instead of 190. In this case 69 of 70 rents are actually within a span of 190 even though the range is 575. • Thus the Range is very sensitive to extremes.
Variance • The variance is a measure of variability that utilizes all the data. • It is based on the difference between the value of each observation (xi) and the mean (x for a sample, m for a population).
Variance • The variance is the average of the squared differences between each data value and the mean. • If the data set is a sample, the variance is denoted by s2. • If the data set is a population, the variance is denoted by 2.
Sample Variance • The reason for the n-1 in the denominator of sample variance is theoretical, but it involves the fact that by using (n-1) rather than n, the sample variance s2 is a better estimate of the population variance. Using n tends to underestimate the population variance. • Think of variance as a measure of how much the data values vary. The unit of the variance is the square of the units of the original data so giving a realistic interpretation to the unit is difficult.
Standard Deviation • The standard deviation of a data set is the positive square root of the variance. • It is measured in the same units as the data, making it more easily comparable, than the variance, to the mean. • If the data set is a sample, the standard deviation is denoted by s where • If the data set is a population, the standard deviation is denoted (sigma).
Example: Apartment Rents • Variance • Standard Deviation
Measures of Relative Locationand Detecting Outliers • z-Scores • Empirical Rule
z-Scores • The z-score is often called the standardized value. • It denotes the number of standard deviations a data value xi is from the mean. • A data value less than the sample mean will have a z-score less than zero. • A data value greater than the sample mean will have a z-score greater than zero. • A data value equal to the sample mean will have a z-score of zero.
Example: Apartment Rents • z-Score of Smallest Value (425) Standardized Values for Apartment Rents
Empirical Rule For data having a bell-shaped distribution: • Approximately 68% of the data values will be within onestandard deviation of the mean.
Empirical Rule For data having a bell-shaped distribution: • Approximately 95% of the data values will be within twostandard deviations of the mean.
Empirical Rule For data having a bell-shaped distribution: • Almost all (99.7%) of the items will be within threestandard deviations of the mean.
Example: Apartment Rents • Empirical Rule Interval% in Interval Within +/- 1s 436 .06 to 545.54 48/70 = 69% Within +/- 2s 381.32 to 600.28 68/70 = 97% Within +/- 3s 326.58 to 655.02 70/70 = 100%
Detecting Outliers • An outlier is an unusually small or unusually large value in a data set. • A data value with a z-score less than -3 or greater than +3 might be considered an outlier. • It might be: • an incorrectly recorded data value • a data value that was incorrectly included in the data set • a correctly recorded data value that belongs in the data set
Example: Apartment Rents • Detecting Outliers The most extreme z-scores are -1.20 and 2.27. Using |z| > 3 as the criterion for an outlier, there are no outliers in this data set. Standardized Values for Apartment Rents