250 likes | 404 Views
Continuous Data. Median. Sorted data: Min position 1 Max position n The median is the value in the “middle” position: position ½ ( 1 + n ) If this position is halfway between, then average the two associated data values. Median = 50 th percentile. Median – Failure Time data.
E N D
Median Sorted data: Min position 1 Max position n The median is the value in the “middle” position: position ½(1 + n) If this position is halfway between, then average the two associated data values. Median = 50th percentile.
Median – Failure Time data Failure times in hours. The median is 232.3. The 50th percentile is 232.3.
Percentile / Percentile Rank The idea is to put the data onto a 0% - 100% scale. Data scale: x Percent scale: k xis the kthpercentile equivalent the percentile rank of x is k
Interpretation x is the kthpercentile / the percentile rank of x is k This means… (Approximately*) k% of units** have variable*** less than x and (100 – k)% of units have variable greater than x. * technically required; you may omit this ** state what the units are – don’t use the word “units” *** state what the variable is – don’tuse the word “variable”
Illustration 1 For seniors graduating from SUNY Oswego, the 70th percentile of (the distribution of) GPAs is 3.274. GPA x = 3.274 Percent k = 70% (=0.70) “the 70th percentile of GPAs is 3.274” “the percentile rank of 3.274 is 70” Write a sentence explaining what this means, without using the word “percentile.” Your statement must identify the units and variable. You may use the word “percent,” and you must use the numbers 3.274 and 70.
Illustration 1 For seniors graduating from SUNY Oswego, the 70th percentile of (the distribution of) GPAs is 3.274. 70% of graduation seniors have GPA below 3.274; the other 30% have GPA above 3.274. unitsvariable
Illustration 1 • For seniors graduating from SUNY Oswego, the 70th percentile of (the distribution of) GPAs is 3.274. • It is not correct to say… • Out of 100 graduating seniors, 70 have GPA below 3.274; the other 30 have GPA above 3.274. • There aren’t exactly 100 graduating seniors • If you chose 100, you would be unlikely to get a 70/30 split.
Illustration 1 For seniors graduating from SUNY Oswego, the 70th percentile of (the distribution of) GPAs is 3.274. It is not correct to say… Out of 100 graduating seniors, 70 have GPA below 3.274; the other 30 have GPA above 3.274. This statement is only true on average assuming you averaged over all possible samples of 100 companies. Expressing this is more difficult and confusing, so just say it the correct way: 70% of graduation seniors have GPA below 3.274; the other 30% have GPA above 3.274.
Illustration 1 70% of graduating seniors have GPA below 3.274; the other 30% have GPA above 3.274. Do not worry about seniors with GPA exactly 3.274. This figure is likely rounded. Very few (much less than 1% of) people will have exactly this GPA.
Percentiles Suitable to data where there are few to no ties. Continuous data
Illustration 2 In discussing investment opportunities, a financial advisor speaks about a company’s “price to earnings” ratio (PE) – the price of a share of stock divided by the amount of profit the company makes annually (ie.: How much it costs to purchase $1 of annual profit). “For the ECC Company, its PE of 7.3 is at the 15th percentile among companies in the industrial sector.” Write a sentence explaining what this means, without using the word “percentile.” Your statement must identify the units and variable. You may use the word “percent,” and you must use the numbers 7.3 and 15.
Illustration 2 “For the Edmundsen company, the PE of 7.3 is at the 15th percentile among companies in the industrial sector.” 15% of companies in the industrial sector have PE below 7.3; the other 85% have PE above 7.3. unitsvariable
Illustration 2 • 15% of companies in the industrial sector have PE below 7.3; the other 85% have PE above 7.3. • It is not correct to say… • Out of 100 industrial companies, 15 have PE below 7.3; the other 85 have PE above 7.3. • There aren’t exactly 100 industrial companies • If you chose 100, you would be unlikely to get a 15/85 split.
Illustration 2 15% of companies in the industrial sector have PE below 7.3; the other 85% have PE above 7.3. It is not correct to say Out of 100 industrial companies, 15 have PE below 7.3; the other 85 have PE above 73. This statement is only true on average assuming you averaged over all possible samples of 100 companies. Expressing this is more difficult and confusing, so just say it the correct way.
Illustration 2 15% of companies in the industrial sector have PE below 7.3; the other 85% have PE above 7.3. Do not worry about companies with PE exactly 7.3. Even ECCs PE is not exactly 7.3 It’s rounded to that figure.
Percentiles & Percentile Ranksin Excel • Data in sorted (low to high) array • Value on data scale: x • Value on % scale “Percentile Rank”: k (%) • =PERCENTRANK(array, x, 9) • (the 9 ensures accuracy) • =PERCENTILE(array, k/100)
Sorted failure time data in cells B2 through B29(n= 28). • Determine the percentile rank for a failure time of 216.6 hours. • =PERCENTRANK(B2:B29, 216.6, 9) • 0.3704 = 37.04% • “216.6 is the 37.04 percentile.” • “The percentile rank of 216.6 is 37.04.”
Rouding of Percents For 10% - 90% to the nearest 1% is generally fine For 1% - 10% and 90% - 99% to the nearest 0.1% is fine For 0.1% - 1.0% and 99.0% - 99.9% to the nearest 0.01% is fine It’s OK to give more precision than is called for. You can run into trouble working with less precision than specified here.
Rounding of Percents Consider two treatments for your condition. With Treatment A the chance of dying is 0.51%. With Treatment B the chance is 1.49%. Rounded to the nearest 1%, both are 1%. Out of 10,000 people getting treatment A, on average 51 die. Out of 10,000 people getting treatment B, on average 149 die. Almost 3 times as many.
Sorted failure time data in cells B2 through B29(n = 28). • Determine the 75th percentile. • 75% has to be “converted” to 0.75 for use in PERCENTILE • =PERCENTILE(B2:B29, 0.75) • 254.2 • “254.2 is the 75th percentile.” • “The percentile rank of 254.2 is 75.”
# of Cars Owned Suppose we surveyed 100 families. Most would say 1 or 2, some 3, a few 4, and a few 0. The data are highly discrete.
# of Cars Owned (sorted) 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 The 38th percentile is 2. The 82nd percentile is 2. Percentiles don’t make much sense for discrete data (and make no sense for categorical data).