290 likes | 429 Views
STA 291 Fall 2009. Lecture 5 Dustin Lueker. Measures of Central Tendency. Mean - Arithmetic Average . Median - Midpoint of the observations when they are arranged in increasing order. Notation: Subscripted variables n = # of units in the sample N = # of units in the population
E N D
STA 291Fall 2009 Lecture 5 Dustin Lueker
Measures of Central Tendency Mean - Arithmetic Average Median - Midpoint of the observations when they are arranged in increasing order Notation: Subscripted variables n = # of units in the sample N = # of units in the population x = Variable to be measured xi= Measurement of the ith unit Mode - Most frequent value. STA 291 Fall 2009 Lecture 5
Median • Measurement that falls in the middle of the ordered sample • When the sample size n is odd, there is a middle value • It has the ordered index (n+1)/2 • Ordered index is where that value falls when the sample is listed from smallest to largest • An index of 2 means the second smallest value • Example • 1.7, 4.6, 5.7, 6.1, 8.3 n=5, (n+1)/2=6/2=3, index = 3 Median = 3rd smallest observation = 5.7 STA 291 Fall 2009 Lecture 5
Median • When the sample size n is even, average the two middle values • Example • 3, 5, 6, 9, n=4 (n+1)/2=5/2=2.5, Index = 2.5 Median = midpoint between 2nd and 3rd smallest observations = (5+6)/2 = 5.5 STA 291 Fall 2009 Lecture 5
Mean and Median • For skewed distributions, the median is often a more appropriate measure of central tendency than the mean • The median usually better describes a “typical value” when the sample distribution is highly skewed • Example • Monthly income for five people 1,000 2,000 3,000 4,000 100,000 • Median monthly income: • Does this better describe a “typical value” in the data set than the mean of 22,000? STA 291 Fall 2009 Lecture 5
Mean and Median • Trimmed mean is a compromise between the median and mean • Calculating the trimmed mean • Order the date from smallest to largest • Delete a selected number of values from each end of the ordered list • Find the mean of the remaining values • The trimming percentage is the percentage of values that have been deleted from each end of the ordered list STA 291 Fall 2009 Lecture 5
Median for Grouped or Ordinal Data • Example: Highest Degree Completed STA 291 Fall 2009 Lecture 5
Calculate the Median • n = 177,618 • (n+1)/2 = 88,809.5 • Median = midpoint between the 88809th smallest and 88810th smallest observations • Both are in the category “High school only” • Mean wouldn’t make sense here since the variable is only ordinal • Median • Can be used for interval data and for ordinal data • Can not be used for nominal data because the observations can not be ordered on a scale STA 291 Fall 2009 Lecture 5
Mean vs. Median • Mean • Interval data with an approximately symmetric distribution • Median • Interval data • Ordinal data • Mean is sensitive to outliers, median is not STA 291 Fall 2009 Lecture 5
Mean vs. Median STA 291 Fall 2009 Lecture 5
Mean vs. Median • Symmetric distribution • Mean = Median • Skewed distribution • Mean lies more towards the direction which the distribution is skewed STA 291 Fall 2009 Lecture 5
Median • Disadvantage • Insensitive to changes within the lower or upper half of the data • Example • 1, 2, 3, 4, 5 • 1, 2, 3, 100, 100 • Sometimes, the mean is more informative even when the distribution is skewed STA 291 Fall 2009 Lecture 5
Example • Keeneland Sales STA 291 Fall 2009 Lecture 5
Mode • Value that occurs most frequently • Does not need to be near the center of the distribution • Not really a measure of central tendency • Can be used for all types of data (nominal, ordinal, interval) • Special Cases • Data Set • {2, 2, 4, 5, 5, 6, 10, 11} • Mode = • Data Set • {2, 6, 7, 10, 13} • Mode = STA 291 Fall 2009 Lecture 5
Mean vs. Median vs. Mode • Mean • Interval data with an approximately symmetric distribution • Median • Interval or ordinal data • Mode • All types of data STA 291 Fall 2009 Lecture 5
Mean vs. Median vs. Mode • Mean is sensitive to outliers • Median and mode are not • Why? • In general, the median is more appropriate for skewed data than the mean • Why? • In some situations, the median may be too insensitive to changes in the data • The mode may not be unique STA 291 Fall 2009 Lecture 5
Example • “How often do you read the newspaper?” • Identify the mode • Identify the median response STA 291 Fall 2009 Lecture 5
Percentiles • The pth percentile (Lp) is a number such that p% of the observations take values below it, and (100-p)% take values above it • 50th percentile = median • 25th percentile = lower quartile • 75th percentile = upper quartile • The index of Lp • (n+1)p/100 STA 291 Fall 2009 Lecture 5
Quartiles • 25th percentile • lower quartile • Q1 • (approximately) median of the observations below the median • 75th percentile • upper quartile • Q3 • (approximately) median of the observations above the median STA 291 Fall 2009 Lecture 5
Example • Find the 25th percentile of this data set • {3, 7, 12, 13, 15, 19, 24} STA 291 Fall 2009 Lecture 5
Interpolation • Use when the index is not a whole number • Want to go closest index lower then go the distance of the decimal towards the next number • If the index is found to be 5.4 you want to go to the 5th value then add .4 of the value between the 5th value and 6th value • In essence we are going to the 5.4th value STA 291 Fall 2009 Lecture 5
Example • Find the 40th percentile of the same data set • {3, 7, 12, 13, 15, 19, 24} • Must use interpolation STA 291 Fall 2009 Lecture 5
Data Summary • Five Number Summary • Minimum • Lower Quartile • Median • Upper Quartile • Maximum • Example • minimum=4 • Q1=256 • median=530 • Q3=1105 • maximum=320,000. • What does this suggest about the shape of the distribution? STA 291 Fall 2009 Lecture 5
Interquartile Range (IQR) • The Interquartile Range (IQR) is the difference between upper and lower quartile • IQR = Q3 – Q1 • IQR = Range of values that contains the middle 50% of the data • IQR increases as variability increases • Murder Rate Data • Q1= 3.9 • Q3 = 10.3 • IQR = STA 291 Fall 2009 Lecture 5
Box Plot • Displays the five number summary (and more) graphical • Consists of a box that contains the central 50% of the distribution (from lower quartile to upper quartile) • A line within the box that marks the median, • And whiskersthat extend to the maximum and minimum values • This is assuming there are no outliers in the data set STA 291 Fall 2009 Lecture 5
Outliers • An observation is an outlier if it falls • more than 1.5 IQR above the upper quartile or • more than 1.5 IQR below the lower quartile STA 291 Fall 2009 Lecture 5
Box Plot • Whiskers only extend to the most extreme observations within 1.5 IQR beyond the quartiles • If an observation is an outlier, it is marked by an x, +, or some other identifier STA 291 Fall 2009 Lecture 5
Example • Values • Min = 148 • Q1 = 158 • Median = Q2 = 162 • Q3 = 182 • Max = 204 • Create a box plot STA 291 Fall 2009 Lecture 5
5 Number Summary/Box Plot • On right-skewed distributions, minimum, Q1, and median will be “bunched up”, while Q3 and the maximum will be farther away. • For left-skewed distributions, the “mirror” is true: the maximum, Q3, and the median will be relatively close compared to the corresponding distances to Q1 and the minimum. • Symmetric distributions? STA 291 Fall 2009 Lecture 5