340 likes | 355 Views
Learn about z-scores, Chebyshev's Theorem, outliers detection, box plots, measures of association, weighted mean, and more in descriptive statistics.
E N D
Slides Prepared by JOHN S. LOUCKS St. Edward’s University
% x Chapter 3 Descriptive Statistics: Numerical MethodsPart B • Measures of Relative Location and Detecting Outliers • Exploratory Data Analysis • Measures of Association Between Two Variables • The Weighted Mean and Working with Grouped Data
Measures of Relative Locationand Detecting Outliers • z-Scores • Chebyshev’s Theorem • Empirical Rule • Detecting Outliers
z-Scores • The z-score is often called the standardized value. • It denotes the number of standard deviations a data value xi is from the mean. • A data value less than the sample mean will have a z-score less than zero. • A data value greater than the sample mean will have a z-score greater than zero. • A data value equal to the sample mean will have a z-score of zero.
Example: Apartment Rents • z-Score of Smallest Value (425) Standardized Values for Apartment Rents
Chebyshev’s Theorem At least (1 - 1/z2) of the items in any data set will be within z standard deviations of the mean, where z is any value greater than 1. • At least 75% of the items must be within z = 2 standard deviations of the mean. • At least 89% of the items must be within z = 3 standard deviations of the mean. • At least 94% of the items must be within z = 4 standard deviations of the mean.
Example: Apartment Rents • Chebyshev’s Theorem Let z = 1.5 with = 490.80 and s = 54.74 At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56% of the rent values must be between - z(s) = 490.80 - 1.5(54.74) = 409 and + z(s) = 490.80 + 1.5(54.74) = 573
Example: Apartment Rents • Chebyshev’s Theorem (continued) Actually, 86% of the rent values are between 409 and 573.
Empirical Rule For data having a bell-shaped distribution: • Approximately 68% of the data values will be within onestandard deviation of the mean.
Empirical Rule For data having a bell-shaped distribution: • Approximately 95% of the data values will be within twostandard deviations of the mean.
Empirical Rule For data having a bell-shaped distribution: • Almost all (99.7%) of the items will be within threestandard deviations of the mean.
Example: Apartment Rents • Empirical Rule Interval% in Interval Within +/- 1s 436.06 to 545.54 48/70 = 69% Within +/- 2s 381.32 to 600.28 68/70 = 97% Within +/- 3s 326.58 to 655.02 70/70 = 100%
Detecting Outliers • An outlier is an unusually small or unusually large value in a data set. • A data value with a z-score less than -3 or greater than +3 might be considered an outlier. • It might be: • an incorrectly recorded data value • a data value that was incorrectly included in the data set • a correctly recorded data value that belongs in the data set
Example: Apartment Rents • Detecting Outliers The most extreme z-scores are -1.20 and 2.27. Using |z| > 3 as the criterion for an outlier, there are no outliers in this data set. Standardized Values for Apartment Rents
Exploratory Data Analysis • Five-Number Summary • Box Plot
Five-Number Summary • Smallest Value • First Quartile • Median • Third Quartile • Largest Value
Example: Apartment Rents • Five-Number Summary Lowest Value = 425 First Quartile = 450 Median = 475 Third Quartile = 525 Largest Value = 615
Box Plot • A box is drawn with its ends located at the first and third quartiles. • A vertical line is drawn in the box at the location of the median. • Limits are located (not drawn) using the interquartile range (IQR). • The lower limit is located 1.5(IQR) below Q1. • The upper limit is located 1.5(IQR) above Q3. • Data outside these limits are considered outliers. … continued
Box Plot (Continued) • Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits. • The locations of each outlier is shown with the symbol* .
Example: Apartment Rents • Box Plot Lower Limit: Q1 - 1.5(IQR) = 450 - 1.5(75) = 337.5 Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5 There are no outliers. 575 600 625 450 375 400 500 525 550 425 475
Measures of Association Between Two Variables • Covariance • Correlation Coefficient
Covariance • The covariance is a measure of the linear association between two variables. • Positive values indicate a positive relationship. • Negative values indicate a negative relationship.
Covariance • If the data sets are samples, the covariance is denoted by sxy. • If the data sets are populations, the covariance is denoted by .
Correlation Coefficient • The coefficient can take on values between -1 and +1. • Values near -1 indicate a strong negative linear relationship. • Values near +1 indicate a strong positive linear relationship. • If the data sets are samples, the coefficient is rxy. • If the data sets are populations, the coefficient is .
The Weighted Mean andWorking with Grouped Data • Weighted Mean • Mean for Grouped Data • Variance for Grouped Data • Standard Deviation for Grouped Data
Weighted Mean • When the mean is computed by giving each data value a weight that reflects its importance, it is referred to as a weighted mean. • In the computation of a grade point average (GPA), the weights are the number of credit hours earned for each grade. • When data values vary in importance, the analyst must choose the weight that best reflects the importance of each value.
Weighted Mean x = wi xi wi where: xi= value of observation i wi = weight for observation i
Grouped Data • The weighted mean computation can be used to obtain approximations of the mean, variance, and standard deviation for the grouped data. • To compute the weighted mean, we treat the midpoint of each class as though it were the mean of all items in the class. • We compute a weighted mean of the class midpoints using the class frequencies as weights. • Similarly, in computing the variance and standard deviation, the class frequencies are used as weights.
Mean for Grouped Data • Sample Data • Population Data where: fi = frequency of class i Mi = midpoint of class i
Example: Apartment Rents Given below is the previous sample of monthly rents for one-bedroom apartments presented here as grouped data in the form of a frequency distribution.
Example: Apartment Rents • Mean for Grouped Data This approximation differs by $2.41 from the actual sample mean of $490.80.
Variance for Grouped Data • Sample Data • Population Data
Example: Apartment Rents • Variance for Grouped Data • Standard Deviation for Grouped Data This approximation differs by only $.20 from the actual standard deviation of $54.74.