270 likes | 375 Views
Chapter 3 Descriptive Statistics: Numerical Methods. Measures of Variability Measures of Relative Location and Detecting Outliers Exploratory Data Analysis Measures of Association Between Two Variables. . . %. x. Measures of Variability.
E N D
Chapter 3 Descriptive Statistics: Numerical Methods • Measures of Variability • Measures of Relative Location and Detecting Outliers • Exploratory Data Analysis • Measures of Association Between Two Variables % x
Measures of Variability • It is often desirable to consider measures of variability (dispersion), as well as measures of location. • For example, in choosing supplier A or supplier B we might consider not only the average delivery time for each, but also the variability in delivery time for each. • Range • Inter-quartile Range • Variance • Standard Deviation • Coefficient of Variation
Measures of Variation Variation Range Variance Standard Deviation Coefficient of Variation Population Standard Deviation Interquartile Range Population Variance Sample Variance Sample Standard Deviation
Measures of variation give information on the spread or variability of the data values. Variation Same center, different variation
Range • Simplest measure of variation • Difference between the largest and the smallest observations: Range = xmaximum – xminimum Example: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 Chap 3-5
Example: Apartment Rents • Range Range = largest value - smallest value Range = 615 - 425 = 190
Interquartile Range • The interquartile range of a data set is the difference between the third quartile and the first quartile. • It is the range for the middle 50% of the data.
Example: Apartment Rents • Interquartile Range 3rd Quartile (Q3) = 525 1st Quartile (Q1) = 445 Interquartile Range = Q3 - Q1 = 525 - 445 = 80
Variance • The variance is a measure of variability that utilizes all the data. • It is based on the difference between the value of each observation (xi) and the mean (x for a sample, m for a population).
Variance • The variance is the average of the squared differences between each data value and the mean. • If the data set is a sample, the variance is denoted by s2. • If the data set is a population, the variance is denoted by 2.
Variance for Grouped Data • Sample Data • Population Data
Standard Deviation • Most commonly used measure of variation • Shows variation about the mean • The standard deviation of a data set is the positive square root of the variance. • If the data set is a sample, the standard deviation is denoted s. • If the data set is a population, the standard deviation is denoted (sigma).
Calculation Example:Sample Standard Deviation Sample Data (Xi) : 10 12 14 15 17 18 18 24 n = 8 Mean = x = 16
Coefficient of Variation • Measures relative variation • Always in percentage (%) • Shows variation relative to mean • Is used to compare two or more sets of data measured in different units Population Sample
Example: Apartment Rents • Variance • Standard Deviation • Coefficient of Variation
Measures of Relative Locationand Detecting Outliers • z-Scores • Detecting Outliers
z-Scores • The z-score is often called the standardized value. • It denotes the number of standard deviations a data value xi is from the mean. • A data value less than the sample mean will have a z-score less than zero. • A data value greater than the sample mean will have a z-score greater than zero. • A data value equal to the sample mean will have a z-score of zero.
Example: Apartment Rents • z-Score of Smallest Value (425) Standardized Values for Apartment Rents
Detecting Outliers • An outlier is an unusually small or unusually large value in a data set. • A data value with a z-score less than -3 or greater than +3 might be considered an outlier. • It might be an incorrectly recorded data value. • It might be a data value that was incorrectly included in the data set.
Example: Apartment Rents • Detecting Outliers The most extreme z-scores are -1.20 and 2.27. Using |z| > 3 as the criterion for an outlier, there are no outliers in this data set. Standardized Values for Apartment Rents
Exploratory Data Analysis • Five-Number Summary
Five-Number Summary • Smallest Value • First Quartile • Median • Third Quartile • Largest Value
Example: Apartment Rents • Five-Number Summary Lowest Value = 425 First Quartile = 450 Median = 475 Third Quartile = 525 Largest Value = 615
Measures of Association between Two Variables • Covariance • Correlation Coefficient
Covariance • The covariance is a measure of the linear association between two variables. • Positive values indicate a positive relationship. • Negative values indicate a negative relationship.
Covariance • If the data sets are samples, the covariance is denoted by sxy. • If the data sets are populations, the covariance is denoted by .
Correlation Coefficient • The coefficient can take on values between -1 and +1. • Values near -1 indicate a strong negative linear relationship. • Values near +1 indicate a strong positive linear relationship. • If the data sets are samples, the coefficient is rxy. • If the data sets are populations, the coefficient is .