1 / 26

STA 291 Summer 2010

STA 291 Summer 2010. Lecture 5 Dustin Lueker. Measures of Central Tendency. Mean - Arithmetic Average . Median - Midpoint of the observations when they are arranged in increasing order. Notation: Subscripted variables n = # of units in the sample N = # of units in the population

jael
Download Presentation

STA 291 Summer 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STA 291Summer 2010 Lecture 5 Dustin Lueker

  2. Measures of Central Tendency Mean - Arithmetic Average Median - Midpoint of the observations when they are arranged in increasing order Notation: Subscripted variables n = # of units in the sample N = # of units in the population x = Variable to be measured xi= Measurement of the ith unit Mode - Most frequent value. STA 291 Summer 2010 Lecture 5

  3. Symbols STA 291 Summer 2010 Lecture 5

  4. Variance and Standard Deviation • Sample • Variance • Standard Deviation • Population • Variance • Standard Deviation STA 291 Summer 2010 Lecture 5

  5. Variance Step By Step • Calculate the mean • For each observation, calculate the deviation • For each observation, calculate the squared deviation • Add up all the squared deviations • Divide the result by (n-1) Or N if you are finding the population variance (To get the standard deviation, take the square root of the result) STA 291 Summer 2010 Lecture 5

  6. Empirical Rule • If the data is approximately symmetric and bell-shaped then • About 68% of the observations are within one standard deviation from the mean • About 95% of the observations are within two standard deviations from the mean • About 99.7% of the observations are within three standard deviations from the mean STA 291 Summer 2010 Lecture 5

  7. Empirical Rule STA 291 Summer 2010 Lecture 5

  8. Percentiles • The pth percentile (Xp) is a number such that p% of the observations take values below it, and (100-p)% take values above it • 50th percentile = median • 25th percentile = lower quartile • 75th percentile = upper quartile • The index of Xp • (n+1)p/100 STA 291 Summer 2010 Lecture 5

  9. Quartiles • 25th percentile • lower quartile • Q1 • (approximately) median of the observations below the median • 75th percentile • upper quartile • Q3 • (approximately) median of the observations above the median STA 291 Summer 2010 Lecture 5

  10. Example • Find the 25th percentile of this data set • {3, 7, 12, 13, 15, 19, 24} STA 291 Summer 2010 Lecture 5

  11. Interpolation • Use when the index is not a whole number • Want to start with the closest index lower than the number found then go the distance of the decimal towards the next number • If the index is found to be 5.4 you want to go to the 5th value then add .4 of the value between the 5th value and 6th value • In essence we are going to the 5.4th value STA 291 Summer 2010 Lecture 5

  12. Example • Find the 40th percentile of the same data set • {3, 7, 12, 13, 15, 19, 24} • Must use interpolation STA 291 Summer 2010 Lecture 5

  13. Data Summary • Five Number Summary • Minimum • Lower Quartile • Median • Upper Quartile • Maximum • Example • minimum=4 • Q1=256 • median=530 • Q3=1105 • maximum=320,000. • What does this suggest about the shape of the distribution? STA 291 Summer 2010 Lecture 5

  14. Interquartile Range (IQR) • The Interquartile Range (IQR) is the difference between upper and lower quartile • IQR = Q3 – Q1 • IQR = Range of values that contains the middle 50% of the data • IQR increases as variability increases • Murder Rate Data • Q1= 3.9 • Q3 = 10.3 • IQR = STA 291 Summer 2010 Lecture 5

  15. Box Plot • Displays the five number summary (and more) graphical • Consists of a box that contains the central 50% of the distribution (from lower quartile to upper quartile) • A line within the box that marks the median, • And whiskersthat extend to the maximum and minimum values • This is assuming there are no outliers in the data set STA 291 Summer 2010 Lecture 5

  16. Outliers • An observation is an outlier if it falls • more than 1.5 IQR above the upper quartile or • more than 1.5 IQR below the lower quartile STA 291 Summer 2010 Lecture 5

  17. Box Plot • Whiskers only extend to the most extreme observations within 1.5 IQR beyond the quartiles • If an observation is an outlier, it is marked by an x, +, or some other identifier STA 291 Summer 2010 Lecture 5

  18. Example • Values • Min = 148 • Q1 = 158 • Median = Q2 = 162 • Q3 = 182 • Max = 204 • Create a box plot STA 291 Summer 2010 Lecture 5

  19. 5 Number Summary/Box Plot • On right-skewed distributions, minimum, Q1, and median will be “bunched up”, while Q3 and the maximum will be farther away. • For left-skewed distributions, the “mirror” is true: the maximum, Q3, and the median will be relatively close compared to the corresponding distances to Q1 and the minimum. • Symmetric distributions? STA 291 Summer 2010 Lecture 5

  20. Mode • Value that occurs most frequently • Does not need to be near the center of the distribution • Not really a measure of central tendency • Can be used for all types of data (nominal, ordinal, interval) • Special Cases • Data Set • {2, 2, 4, 5, 5, 6, 10, 11} • Mode = • Data Set • {2, 6, 7, 10, 13} • Mode = STA 291 Summer 2010 Lecture 5

  21. Mean vs. Median vs. Mode • Mean • Interval data with an approximately symmetric distribution • Median • Interval or ordinal data • Mode • All types of data STA 291 Summer 2010 Lecture 5

  22. Mean vs. Median vs. Mode • Mean is sensitive to outliers • Median and mode are not • Why? • In general, the median is more appropriate for skewed data than the mean • Why? • In some situations, the median may be too insensitive to changes in the data • The mode may not be unique STA 291 Summer 2010 Lecture 5

  23. Example • “How often do you read the newspaper?” • Identify the mode • Identify the median response STA 291 Summer 2010 Lecture 5

  24. Measures of Variation • Statistics that describe variability • Two distributions may have the same mean and/or median but different variability • Mean and Median only describe a typical value, but not the spread of the data • Range • Variance • Standard Deviation • Interquartile Range • All of these can be computed for the sample or population STA 291 Summer 2010 Lecture 5

  25. Range • Difference between the largest and smallest observation • Very much affected by outliers • A misrecorded observation may lead to an outlier, and affect the range • The range does not always reveal different variation about the mean STA 291 Summer 2010 Lecture 5

  26. Example • Sample 1 • Smallest Observation: 112 • Largest Observation: 797 • Range = • Sample 2 • Smallest Observation: 15033 • Largest Observation: 16125 • Range = STA 291 Summer 2010 Lecture 5

More Related