1 / 42

measures of centrality

measures of centrality. Last lecture summary. Mode Distribution. Life expectancy data. Minimum. minimum = 47.8. Sierra Leone. Maximum. maximum = 84.3. Japan. Life expectancy data. all countries. Life expectancy data. half larger. 73.2. half smaller. Egypt. 1. 99. 197.

lalo
Download Presentation

measures of centrality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. measures of centrality

  2. Last lecture summary • Mode • Distribution

  3. Life expectancy data

  4. Minimum minimum = 47.8 Sierra Leone

  5. Maximum maximum = 84.3 Japan

  6. Life expectancy data all countries

  7. Life expectancy data half larger 73.2 half smaller Egypt 1 99 197

  8. Life expectancy data Maximum= 83.4 Median= 73.2 Minimum = 47.8

  9. Q1 1st quartile = 64.7 Sao Tomé & Príncipe 50 (¼ way) 1 197

  10. Q1 1st quartile = 64.7 ¼ smaller ¾ larger

  11. Q3 3rd quartile = 76.7 Netherland Antilles 148 (¾ way) 1 197

  12. Q3 3rd quartile = 76.7 ¾ smaller ¼ larger

  13. Life expectancy data Maximum= 83.4 3rd quartile = 76.7 Median= 73.2 1st quartile = 64.7 Minimum = 47.8

  14. Box Plot

  15. Box plot maximum 3rd quartile median 1st quartile minimum

  16. Quartiles, median – how to do it? Find min, max, median, Q1, Q3 in these data. Then, draw the box plot. 79, 68, 88, 69, 90, 74, 87, 93, 76

  17. Another example Min. 1st Qu. Median 3rd Qu. Max. 68.00 75.00 81.00 88.50 93.00 78, 93, 68, 84, 90, 74

  18. Percentiles věk [roky] http://www.rustovyhormon.cz/on-line-rustove-grafy

  19. Skeleton data • Estimate age at death from skeletal remains • Common problem in forensic anthropology • Based on wear and deterioration of certain bones • Measurements on 400 skeletons • Two estimation methods • Di Gangi et al., aspects of the first rib • Suchey-Brooks, most common, pubic bone http://www.bestcoloringpagesforkids.com/wp-content/uploads/2013/07/Skeleton-Coloring-Page.gif

  20. 400 skeletons, the estimated and the actual age of death

  21. DiGangi

  22. Modified boxplot Min. Q1 Median Q3 Max. -60.00 -23.00 -13.00 -5.00 32.00

  23. Mean • Mathematical notation: • … Greek letter capital sigma • means SUM in mathematics • Another measure of the center of the data: mean (average) • Data values:

  24. Robust statistic Median = -13 Mean = -14.2 Mean is not arobuststatistic. Median is a robust statistic.

  25. Trimmed mean Median = -13 Mean = -14.2 10% trimmed mean … eliminate upper and lower 10% of data (i.e. 40 points). 10% trimmed mean = mean of 320 middle data values = -13.8 Trimmed mean is more robust.

  26. Salary o 25 players of the American football (NY red Bulls) in 2012. median = 112 495 mean = 518 311 8% trimmed mean = 128 109

  27. measures of variability

  28. Navození atmosféry

  29. QUESTION Mean1 Mean2 Mode1 Mode2 Median1 Median2

  30. range (variační rozpětí) MAX - min

  31. Range Range changes when we add new data into dataset • Always • Sometimes • Never

  32. Adding Mark Zuckerberg

  33. Cut off data IQR, mezikvartilové rozpětí

  34. Interquartile range, IQR Let’ take this quiz, answer yes ot not. • About 50% of the data fall within the IQR. • The IQR is affected by every value in the data set. • The IQR is not affected by outliers. • The mean is always between Q1 and Q3. 0 1 1 1 2 2 2 2 2 3 3 3 90 Q1=1 Q2 Q3=3

  35. Define outlier OR What values are outliers for this data set? $60,000 $80,000 $100,000 $200,000

  36. Problem with IQR normal bimodal uniform

  37. Options for measuring variability • Find the average distance between all pairs of data values. • Find the average distance between each data value and either the max or the min. • Find the average distance between each data value and the mean.

  38. Average distance from mean

  39. Average distance from mean

  40. Average distance from mean Find the average distance between each data value and the mean.

  41. Preventing cancellation • How can we prevent the negative and positive deviations from cancelling each out? • Ignore (i.e. delete) the negative sign. • Multiply each deviation by two. • Square each deviation. • Take absolute value of each deviation.

More Related