420 likes | 553 Views
measures of centrality. Last lecture summary. Mode Distribution. Life expectancy data. Minimum. minimum = 47.8. Sierra Leone. Maximum. maximum = 84.3. Japan. Life expectancy data. all countries. Life expectancy data. half larger. 73.2. half smaller. Egypt. 1. 99. 197.
E N D
Last lecture summary • Mode • Distribution
Minimum minimum = 47.8 Sierra Leone
Maximum maximum = 84.3 Japan
Life expectancy data all countries
Life expectancy data half larger 73.2 half smaller Egypt 1 99 197
Life expectancy data Maximum= 83.4 Median= 73.2 Minimum = 47.8
Q1 1st quartile = 64.7 Sao Tomé & Príncipe 50 (¼ way) 1 197
Q1 1st quartile = 64.7 ¼ smaller ¾ larger
Q3 3rd quartile = 76.7 Netherland Antilles 148 (¾ way) 1 197
Q3 3rd quartile = 76.7 ¾ smaller ¼ larger
Life expectancy data Maximum= 83.4 3rd quartile = 76.7 Median= 73.2 1st quartile = 64.7 Minimum = 47.8
Box plot maximum 3rd quartile median 1st quartile minimum
Quartiles, median – how to do it? Find min, max, median, Q1, Q3 in these data. Then, draw the box plot. 79, 68, 88, 69, 90, 74, 87, 93, 76
Another example Min. 1st Qu. Median 3rd Qu. Max. 68.00 75.00 81.00 88.50 93.00 78, 93, 68, 84, 90, 74
Percentiles věk [roky] http://www.rustovyhormon.cz/on-line-rustove-grafy
Skeleton data • Estimate age at death from skeletal remains • Common problem in forensic anthropology • Based on wear and deterioration of certain bones • Measurements on 400 skeletons • Two estimation methods • Di Gangi et al., aspects of the first rib • Suchey-Brooks, most common, pubic bone http://www.bestcoloringpagesforkids.com/wp-content/uploads/2013/07/Skeleton-Coloring-Page.gif
Modified boxplot Min. Q1 Median Q3 Max. -60.00 -23.00 -13.00 -5.00 32.00
Mean • Mathematical notation: • … Greek letter capital sigma • means SUM in mathematics • Another measure of the center of the data: mean (average) • Data values:
Robust statistic Median = -13 Mean = -14.2 Mean is not arobuststatistic. Median is a robust statistic.
Trimmed mean Median = -13 Mean = -14.2 10% trimmed mean … eliminate upper and lower 10% of data (i.e. 40 points). 10% trimmed mean = mean of 320 middle data values = -13.8 Trimmed mean is more robust.
Salary o 25 players of the American football (NY red Bulls) in 2012. median = 112 495 mean = 518 311 8% trimmed mean = 128 109
QUESTION Mean1 Mean2 Mode1 Mode2 Median1 Median2
range (variační rozpětí) MAX - min
Range Range changes when we add new data into dataset • Always • Sometimes • Never
Cut off data IQR, mezikvartilové rozpětí
Interquartile range, IQR Let’ take this quiz, answer yes ot not. • About 50% of the data fall within the IQR. • The IQR is affected by every value in the data set. • The IQR is not affected by outliers. • The mean is always between Q1 and Q3. 0 1 1 1 2 2 2 2 2 3 3 3 90 Q1=1 Q2 Q3=3
Define outlier OR What values are outliers for this data set? $60,000 $80,000 $100,000 $200,000
Problem with IQR normal bimodal uniform
Options for measuring variability • Find the average distance between all pairs of data values. • Find the average distance between each data value and either the max or the min. • Find the average distance between each data value and the mean.
Average distance from mean Find the average distance between each data value and the mean.
Preventing cancellation • How can we prevent the negative and positive deviations from cancelling each out? • Ignore (i.e. delete) the negative sign. • Multiply each deviation by two. • Square each deviation. • Take absolute value of each deviation.