Chapter 3

Chapter 3 Descriptive Statistics Part II • Describing Central Tendency • Measures of Variation Dr. Constance Lightner- Fayetteville State University

Important Characteristics of data • Center: A value that indicates where the middle of the data set is located. • Variation: A measure of the amount that the data values vary among themselves • Distribution: The nature or shape of the distribution of data (such as bell-shaped, uniform, or skewed) • Outliers: Sample values that lie very far away from the vast majority of the other samples Dr. Constance Lightner- Fayetteville State University

Describing Central Tendency • Mean, , is the average or expected value • Median, Md , is the middle point of the ordered measurements • Mode, Mo, is the most frequent value • Percentiles and Quartiles Dr. Constance Lightner- Fayetteville State University

The sample size, i.e. the number of items in the sample, is denoted n. The population size, i.e. the total number of items in the entire population, is denoted N. Basic Symbols Dr. Constance Lightner- Fayetteville State University

Mean • If the data are from a population, the mean is denoted by  (mu). • If the data are from a sample, the mean is denoted by . The sample mean is a point estimate of the population mean . Dr. Constance Lightner- Fayetteville State University

Example 3.1 Suppose we compiled a sample of the weights of 5 professional football players 255, 216, 346, 300, 270 Dr. Constance Lightner- Fayetteville State University

Example 3.2Given below is a sample of monthly rent values ($) for one-bedroom apartments. The data is a sample of 70 apartments in a particular city. The data are presented in ascending order. Anderson, Sweeney, and Williams Dr. Constance Lightner- Fayetteville State University

The Median The medianis a value such that at least 50% of all measurements are less than or equal to it and at least 50% of all measurements are greater than or equal to it . • The median is the measure of location most often reported for annual income and property value data. • This measure is used instead of the mean since a few extremely large incomes or property values can inflate the mean. Dr. Constance Lightner- Fayetteville State University

The median Md is found as follows: • Arrange values in ascending order (smallest to largest). • If the number of measurements is odd, the median is the middle value. • If the number of measurements is even, the median is the average of the two middle values. Dr. Constance Lightner- Fayetteville State University

Example 3.3 Suppose the following represent a sample of salaries of 13 Internist(x$1000) 127 132 138 141 144 146 152 154 165 171 177 192 241 Since n = 13 (odd,) then the median is the middlemost or 7th measurement, Md=152 Dr. Constance Lightner- Fayetteville State University

Example 3.2 Revisited Median = (475 + 475)/2 = 475 Dr. Constance Lightner- Fayetteville State University

Mode The mode, Mo , is the measurement that occurs most frequently. • The greatest frequency can occur at two or more different values. • If the data have exactly two modes, the data are bimodal. • If the data have more than two modes, the data are multimodal. • Mode is an important measure of location for qualitative data (can not compute median and mean for qualitative data) Dr. Constance Lightner- Fayetteville State University

Mode 450 occurred most frequently (7 times) Mode = 450 Dr. Constance Lightner- Fayetteville State University

Percentiles and Quartiles • A percentile provides information about how the data are spread over the interval from the smallest value to the largest value. • Admission test scores for colleges and universities are frequently reported in terms of percentiles. Dr. Constance Lightner- Fayetteville State University

Percentiles The pth percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this value or more. Steps for computing percentiles: • Arrange the data in ascending order. 2. Compute index i, the position of the pth percentile.i = (p/100)n 3a. If i is not an integer, round up. The p th percentile is the value in this position. 3b. If i is an integer, the p th percentile is the average of the values in positions i and i +1. Dr. Constance Lightner- Fayetteville State University

Example 3.2 Revisited 90th Percentile i = (p/100)n = (90/100)70 = 63 Averaging the 63rd and 64th data values: • 90th Percentile = (580 + 590)/2 = 585 Dr. Constance Lightner- Fayetteville State University

65th Percentile i = (p/100)n = (65/100)70 = 45.5 This is a non integer so round i up to 46. Data value in position 46=500 65th Percentile =500 Dr. Constance Lightner- Fayetteville State University

Quartiles • Quartiles are specific percentiles • First Quartile = 25th Percentile • Second Quartile = 50th Percentile = Median • Third Quartile = 75th Percentile Dr. Constance Lightner- Fayetteville State University

Example 2.4 Revisited Third quartile = 75th percentile i = (p/100)n = (75/100)70 = 52.5 = 53 • Third quartile = 525 Dr. Constance Lightner- Fayetteville State University

Measures of Variation • The range is the largest minus the smallest measurement. • The variance is the average of the sum of the square of the deviations from the mean. • The standard deviation is the square root of the variance. • In a comparison of multiple variables, the one with the largest variance shows the most variability in the data. Dr. Constance Lightner- Fayetteville State University

Example 3.3 Revisited Internist’s Salaries (in thousands of dollars) 127 132 138 141 144 146 152 154 165 171 177 192 241 Range = 241 - 127 = 114 ($114,000) Dr. Constance Lightner- Fayetteville State University

Example 3.2 Revisited Range = largest value - smallest value Range = 615 - 425 = 190 Dr. Constance Lightner- Fayetteville State University

Variance If the data set is a sample, the variance is denoted by s2. If the data set is a population, the variance is denoted by  2. Dr. Constance Lightner- Fayetteville State University

Standard Deviation If the data set is a sample, the standard deviation is denoted s. If the data set is a population, the standard deviation is denoted  (sigma). Dr. Constance Lightner- Fayetteville State University

Example 3.1 Revisited (recall =277.4) xi x I - (xi - )2 x1= 255 277.4 x1- = 255 - 277.4 = -22.4 (x1 - )2 = (-22.4)2 = 501.76 x2= 216 277.4 x2 - = 216 - 277.4= -61.4 (x2 - )2 = (-61.4)2 = 3769.96 x3= 346 277.4 x3 - = 346 - 277.4= 68.6 (x3 - )2 = (68.6)2 = 4705.96 x4= 300 277.4 x4 - = 300 - 277.4= 22.6 (x4 - )2 = (22.6)2 = 510.76 x5=270 277.4 x5 - = 270 - 277.4= -7.4 (x5 - )2 = (-7.4)2 = 54 .76 sum = 9543.2 Since this is sample data Dr. Constance Lightner- Fayetteville State University

Example 3.1 Revisited (continued) If this was data from the entire population xi xi- (xi - )2 255 277.4 -22.4 501.76 216 277.4 -61.4 3769.96 346 277.4 68.6 4705.96 300 277.4 22.6 510.76 270 277.4 -7.4 54.76 9543.2 Dr. Constance Lightner- Fayetteville State University

Example 3.4 Compute the standard deviation of the following sample data: 4, 5, 1, -2, 7 xi xi- (xi- )2 4 3 1 1 5 3 2 4 1 3 -2 4 -2 3 -5 25 7 3 4 16 Dr. Constance Lightner- Fayetteville State University

Z score • The z-score is often called the standardized value. • It is a measure of location that tells how far a particular observation is from the mean. • It denotes the number of standard deviations a data value xi is from the mean. • A data value less than the sample mean will have a z-score less than zero. • A data value greater than the sample mean will have a z-score greater than zero. • A data value equal to the sample mean will have a z-score of zero. xi is the data value for which you want the z score Dr. Constance Lightner- Fayetteville State University

Example 3.1 Revisited 255, 216, 346, 300, 270 The z score for the data value 216 is Dr. Constance Lightner- Fayetteville State University

Chebyshev’s Rule • Chebyshev’s rule applies to any data set, regardless of the shape of the distribution of the data • Can be used to make statements about the proportion of data values that must be within a specified number of standard deviations from the mean • Chebyshev’s Rule At least (1 - 1/k2) of the items in any data set will be within k standard deviations of the mean, where kis any value greater than 1. Implications: • At least 75% of the items must be within k = 2 standard deviations of the mean. (i.e. within the interval [ - 2s, + 2s]) • At least 89% of the items must be within k = 3 standard deviations of the mean. (i.e. within the interval [ - 3s, + 3s]) • At least 94% of the items must be within k = 4 standard deviations of the mean. (i.e. within the interval [ - 4s, + 4s])

Example 3.1 Revisited Let k = 1.5 with =277.4 and s = 48.84 According to Chebyshev’s Rule, At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56% football players weights are between - k(s) = 277.4 - 1.5(48.84) = 204.14 and + k(s) = 277.4 + 1.5(48.84) = 350.66 Dr. Constance Lightner- Fayetteville State University

Empirical Rule for Normal Populations • For a Normal distribution, the Empirical rule can be used to make statements about the proportion of data values that must be within a specified number of standard deviations from the mean If a population has mean  and standard deviation  and is described by a normal curve, then • 68.26% of the population measurements lie within one standard deviation of the mean: [ -,  +] • 95.44% of the population measurements lie within two standard deviations of the mean: [ -2,  +2] • 99.73% of the population measurements lie within three standard deviations of the mean: [ -3,  +3]

Outliers • Outliers are defined as sample values that lie very far away from the vast majority of the other samples • Your book uses the term “unusual” to refer to outliers • We will use the course notes to numerically determine an outlier or an unusual value, NOT the criteria set in your text book. • If we assume the data is normally distributed, there are two ways to numerically determine if a sample value is an outlier: • 1.) Determine the interval ( - 3s, + 3s). If a value is OUTSIDE the interval, then this value in an outlier. • 2.) Compute the z-value for a sample value. If z > 3 or z < -3, then the value is an outlier. Dr. Constance Lightner- Fayetteville State University

Example (Outliers) • For the first quiz in MATH 123, the average quiz score was 16 (out of 20 pts.), with a variance of 4 pts. Would a score of 11 be considered an usually low score? • We were given • = 16, s=2 and xi=11 • Note: s=2 not 4 because we must take the square root of the variance to get the standard deviation • Using the 1st method we determine the range • ( - 3s, +3s) (16-3(2) , 16+3(2) (10, 22). • Since 11 is within this range, this score is not unusually low. • Using the 2nd method we determine the z-value • Since -2.5 is not less than -3, this score is not unusually low. • Note: Either method can be used to determine whether a value is an outlier, because both will ALWAYS yield the same result.

The End Dr. Constance Lightner- Fayetteville State University

Chapter 3

Chapter 3

Presentation Transcript

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

chapter 3

CHAPTER 3-3

Chapter 3-3

Chapter 3 Chapter 3

CHAPTER 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

CHAPTER 3

Chapter 3