410 likes | 512 Views
Measures of Position Where does a certain data value fit in relative to the other data values?. To accompany Hawkes lesson 3.3 Original content by D.R.S. N th Place. The highest and the lowest 2 nd highest, 3 rd highest, etc. “If I made $60,000, I would be 6 th richest.”.
E N D
Measures of PositionWhere does a certain data value fit in relative to the other data values? To accompany Hawkes lesson 3.3 Original content by D.R.S.
Nth Place • The highest and the lowest • 2nd highest, 3rd highest, etc. • “If I made $60,000, I would be 6th richest.”
Another view: “How does my compare to the mean?” • “Am I in the middle of the pack?” • “Am I above or below the middle?” • “Am I extremely high or extremely low?” • Score is the measuring stick
Score: is how many standard deviations away from the mean? If you know the x value To work backward from z to x Population Sample • Population: • Sample
score is also called “Standard Score” • No matter what is measured in or how large or small the values are…. • The score of the mean will be 0 • Because numerator turns out to be 0. • If is above the mean, its is positive. • Because numerator turns out to be positive • If is below the mean, its is negative. • Because numerator turns out to be negative
score values • Typically round to two decimal places. • Don’t say “0.2589”, say “0.26” • If not two decimal places, pad • Don’t say “2”, say “2.00” • Don’t say “-1.1”, say “-1.10” • scores are almost always in the interval . Be very suspicious if you calculate a score that’s not a small number.
Practice: Given x, compute z Find the scores corresponding to the salary values, given that the mean, and the standard deviation .
Practice: Given z, compute x Find the scores (salaries) corresponding to these standard scores, given that the mean, and the standard deviation . • and • and • and
Example: Using scores to compare unlike items The Literature test The Biology test The mean score was 47 points The standard deviation was 6 points Sue earned 55 points Find her z score for this test On which test did she have the “better” performance? • The mean score was 77 points. • The standard deviation was 11 points • Sue earned 91 points • Find her z score for this test
scores caution with negatives • Example: compare test scores on two different tests to ascertain “Which score was the more outstanding of the two?” • Be careful if the scores turn out to be negative. Which is the better performance? or ? • Stop and think back to your basic number line and the meaning of “<“ and “>”
Percentiles • “What percent of the values are lower than my value?” • 90th percentile is pretty high • 50th percentile is right in the middle • 10th percentile is pretty low • If you scored in the 99th percentile on your SAT, I hope you got a scholarship.
Salary data for our percentile examples • With these salary values again • What’s thepercentile for a salary of $59,000 ? • You can see it’s going to be higher than 50th Because it’s in the top half.
Example: Given x, find the percentile • Count = how many values below $59,000 • Count = how many values in the data set • Formula for percentile • Here we have values lower than our $59,000 • Here we have values in the data set. • so , “75th percentile”
Continued: Given x, find the percentile • so • Do not say “75%”, but say “the 75th percentile” • Other sources use different formulas, beware! • Some other books use in the numerator. • Excel has two different answers, PERCENTILE.EXC and PERCENTILE.INC functions.
Given Percentile , find the value • Formula: position from bottom • Again, how many data values in the set • and the percentile rank that’s given. • Is there a decimal remainder in position ? • If so, then BUMP UP to the next highest whole # and take the value in that position. • Or if is an exact whole number, take the average from positions and . • Note: Book uses lowercase instead of .
Given Percentile , find the value • Example: What is the 31st percentile in the salary data? • 31st percentile: plug in • Compute . It has a remainder. • Bump it up! 7. • Not rounding, but rather bumpety-upping • So we look 7 positions from the bottom • “The 31st percentile is $44,476”
Given Percentile , find the value • Example: What is the 40th percentile in the salary data? Plug in • Compute . Exact integer! • So count 8th and 9th from bottom. • “The 40th percentile is $47,367.50, or $47,368.”
Excel gives different answers • Excel does some fancy interpolation
Quartiles Q1, Q2, Q3 • Data values are arranged from low to high. • The Quartiles divide the data into four groups. • Q2 is just another name for the Median. • Q1 = Find the Median of Lowest to Q2 values • Q3 = Find the Median of Q2 to Highest values • It gets tricky, depending on how many values.
Quartiles example • 10, 20, 30, 40, 50, 60, 70, 80, 90 • The Second Quartile, Q2 = median = 50 • Find the medians of the subsets left and right. • Keep the 50 in each of those subsets. • The First Quartile, Q1= median of { 10, 20, 30, 40, 50 } = 30 • The Third Quartile, Q3= median of { 50, 60, 70, 80, 90 } = 70
Quartiles example • 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 • Q2 = median =. (two middle #s) • Leave the 50 and 60 in place; do not reuse 55 • Q1 = median of {10, 20, 30, 40, 50} = 30 • Q3 = median of {60, 70, 80, 90, 100} = 80
Quartiles example • 0, 10, 20,30, 40, 50, 60, 70, 80, 90, 100, 110 • Q2 = median = (two middle #s). • 55 isn’t really there so you can’t remove it! • Leave the 50 and 60 in place • Q1 = median of {0, 10, 20, 30, 40, 50} = 25 • Q3 = median of {60, 70, 80, 90, 100, 110} = 85 • Two middle numbers happened again!
Interquartile Range • Definition: IQR = Q3 – Q1 • In the previous example, 85 – 25 = 60. • Interquartile Range measures how spread out the middle of the data are • The lowest quartile (x < Q1) is not involved • And the highest quartile (x > Q3) is not involved.
Quartiles with TI-84 • 0, 10, 20,30, 40, 50, 60, 70, 80, 90, 100, 110 • Put values into a TI-84 List • Use STAT, CALC, 1-Var Stats • Scroll down downdown to get to them.
There is disagreement about Quartiles • The TI-84 sometimes gives different answers than the method we use in the Hawkes materials • Excel might give different answers from Hawkes and TI-84, both. • Use the Hawkes method in this course’s work • Be aware of the others • You should know how to use TI-84 and Excel • You should be aware that differences can occur.
Quartiles with TI-84 vs. Hawkes • 10, 20, 30, 40, 50, 60, 70, 80, 90 • We got Q1=30 and Q3=70 before. • Hawkes keeps the 50,using 10,20,30,40,50to compute Q1. • But the TI-84 throwsout 50 and uses 10,20,30,40. • Hawkes says the TI-84 is computing “hinges”.
Quartiles in Excel • =QUARTILE.INC(cells, 1 or 2 or 3) seems to give the same results as the old QUARTILE function • There’s new =QUARTILE.EXC(cells, 1 or 2 or 3) • Excel does fancy interpolation stuff and may give different Q1 and Q3 answers compared to the TI-84 and our by-hand methods.
The Five Number Summary • Again: 0, 10, 20,30, 40, 50, 60, 70, 80, 90, 100, 110 • Q2 = median =, Q1 = 25and Q3 = 85 • “The Five Number Summary” is defined as: the minimum, then Q1, Q2, Q3, then the maximum • For this set of numbers, the Five Number Summary is “0, 25, 55, 85, 110”
The Five Number Summary • Again: 0, 10, 20,30, 40, 50, 60, 70, 80, 90, 100, 110 • Q2=55, Q1=25, Q3 = 85 • Min is 0, Max is 110 • For this set of numbers, the Five Number Summary is “0, 25, 55, 85, 110” • Box Plot • TI-84 can do Box Plot too, but again its quartiles disagree with the way Hawkes defines quartiles. Min Q1 Q2 Q3 Max 0 25 55 85 110
Why Box Plot? • Don’t lose sight of the big picture here: • We have a data set • It’s a bunch of numbers • We want to summarize the data • Summarize means make it into a sound bite • We must be Concise – don’t say too much • We must be Informative – don’t say too little
We must be Concise • Bad: “Here is a report that tells you the mean and the variance and the standard deviation and the quartiles and the percentiles from 0 to 100… and the marketing survey analyzed by demographic subgroups …” (there is a place for that, but not right now) • Good: “Got fifteen seconds? Here’s what we found.”
Notice the pieces of the boxplot: • Horizontal scale, maybe a little beyond the min and the max. A generic number line. • The five numbers. • The box holds the quartiles • With a line in the middle at the median. • The whiskers extend out to the min and the max.
TI-84 Boxplot • See instructions on separate handout. • Caution again that TI-84 computes quartiles differently from Hawkes and differently from Excel, so the results aren’t always going to agree.
Additional Topics • Might not be needed for Hawkes homework • But you should be aware of them • Quintiles and Deciles • Interquartile Range and Outliers • TI-84 Box Plot
Quintiles and Deciles • You might also encounter • Quintiles, dividing data set into 5 groups. • Deciles, dividing data set into 10 groups. • Reconcile everything back with percentiles: • Quartiles correspond to percentiles 25, 50, 75 • Deciles correspond to percentiles 10, 20, …, 90 • Quintiles correspond to percentiles 20, 40, 60, 80
Interquartile Range and Outliers • Concept: An OUTLIER is a wacky far-out abnormally small or large data value compared to the rest of the data set. • We’d like something more precise. • Define: IQR = Interquartile Range = Q3 – Q1. • Define: If , is an Outlier. • Define: If , is an Outlier. • (Other books might make different definitions)
Outliers Example • Here’s an quick elementary example: • Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20 • Mean and • Or in Hawkes method, , , and we still get interquartile range = (it won’t always work out the same but in this case the IQR is the same either way)
Outliers Example • Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20 • We found IQR = 6 and the mean is 6.8 • One definition uses to define outliers • Here, • Anything more than 9 units away from is then considered to be abnormally small or large. • , nothing smaller than • : the 20 is an outlier.
No-Outliers Example • Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10 • Mean and (coincidence that , insignificant) • Anything more than 9 units away from is abnormal. • This data set has No Outliers.
Outliers: Good or Bad? • “I have an outlier in my data set. Should I be concerned?” • Could be bad data. A bad measurement. Somebody not being honest with the pollster. • Could be legitimately remarkable data, genuine true data that’s extraordinarily high or low. • “What should I do about it?” • The presence of an outlier is shouting for attention. Evaluate it and make an executive decision.