1 / 51

measures of centrality

measures of centrality. Last lecture summary. Which graphs did we meet? scatter plot ( bodový graf ) bar chart (sloupcový graf) histogram pie chart (koláčový graf) How do they work, what are their advantages and/or disadvantages?. SDA women – histogram of heights 2014. n = 48 or N = 48

nam
Download Presentation

measures of centrality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. measures of centrality

  2. Last lecture summary • Which graphs did we meet? • scatter plot (bodový graf) • bar chart (sloupcový graf) • histogram • pie chart (koláčový graf) • How do they work, what are their advantages and/or disadvantages?

  3. SDA women – histogram of heights 2014 n = 48 or N = 48 bin size = 3.8

  4. Distributions negatively skewed skewed to the left positively skewed skewed to the left e.g., body height e.g., life expectancy e.g., income http://turnthewheel.org/free-textbooks/street-smart-stats/

  5. statistics is beatiful new stuff

  6. Life expectancy data • Watch TED talk by Hans Rosling, Gapminder Foundation: http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html

  7. statistics is deep

  8. UC Berkeley Though data are fake, the paradox is the same Simpson’s paradox www.udacity.com – Introduction to statistics

  9. Male www.udacity.com – Introduction to statistics

  10. Male www.udacity.com – Introduction to statistics

  11. Female www.udacity.com – Introduction to statistics

  12. Female www.udacity.com – Introduction to statistics

  13. Gender bias What do you think, is there a gender bias? Who do you think is favored? Male or female? www.udacity.com – Introduction to statistics

  14. Gender bias male female www.udacity.com – Introduction to statistics

  15. Gender bias male female www.udacity.com – Introduction to statistics

  16. Statistics is ambiguous • This example ilustrates how ambiguous the statistics is. • In choosing how to graph your data you may majorily impact what people believe to be the case. “I never believe in statistics I didn’t doctor myself.” “Nikdy nevěřím statistice, kterou si sám nezfalšuji.” Who said that? Winston Churchill www.udacity.com – Introduction to statistics

  17. What is statistics? • Statistics – the science of collecting, organizing, summarizing, analyzing and interpreting data • Goal – use imperfect information (our data) to infer facts, make predictions, and make decisions • Descriptive statistic – describing and summarising data with numbers or pictures • Inferential statistics – making conclusions or decisions based on data

  18. Variables • variable – a value or characteristics that can vary from individual to individual • example: favorite color, age • How variables are classified? • quantitative variable – numerical values, often with units of measurement, arise from the how much/how many question, example: age, annual income, number children • continuous (spojitá proměnná), example: height, weight • discrete (diskrétní proměnná), example: number of children • continuous variables can be discretized

  19. Variables • categorical (qualitative) variables • categories that have no particular order • example: favorite color, gender, nationality • ordinal • they are not numerical but their values have a natural order • example: tempterature low/medium/high

  20. Variables variable (proměnná) quantitative (kvantitativní) categorical (kategorická) ordinal (ordinální) continuous (spojitá) discrete (diskrétní)

  21. Choosing a profession Chemistry Geography 50 000 – 60 000 40 000 – 55 000 www.udacity.com – Statistics

  22. Choosing a profession • We made an interval estimate. • But ideally we want one number that describes the entire dataset. This allows us to quickly summarize all our data. www.udacity.com – Statistics

  23. Choosing a profession • The value at which frequency is highest. • The value where frequency is lowest. • Value in the middle. • Biggest value of x-axis. • Mean Geography Chemistry www.udacity.com – Statistics

  24. Three big M’s • The value at which frequency is highest is called the mode. i.e. the most common value is the mode. • The value in the middle of the distribution is called the median. • The mean is the mean (average is the synonymum). Geography Chemistry www.udacity.com – Statistics

  25. Quick quiz • What is the mode in our data? 2 5 6 5 2 6 9 8 5 2 3 5 www.udacity.com – Statistics

  26. Mode in negatively skewed distribution www.udacity.com – Statistics

  27. Mode in uniform distribution www.udacity.com – Statistics

  28. Multimodal distribution www.udacity.com – Statistics

  29. Mode in categorical data www.udacity.com – Statistics

  30. More of mode True or False? • The mode can be used to describe any type of data we have, whether it’s numerical or categorical. • All scores in the dataset affect the mode. • If we take a lot of samples from the same population, the mode will be the same in each sample. • There is an equation for the mode. • Ad 3. • http://onlinestatbook.com/stat_sim/sampling_dist/ • http://www.shodor.org/interactivate/activities/Histogram/ - mode changes as you change a bin size. • Because 3. is not true, we can’t use mode to learn something about our population. Mode depends on how you present the data. www.udacity.com – Statistics

  31. Life expectancy data www.coursera.org – Statistics: Making Sense of Data

  32. Minimum minimum = 47.8 Sierra Leone www.coursera.org – Statistics: Making Sense of Data

  33. Maximum maximum = 84.3 Japan www.coursera.org – Statistics: Making Sense of Data

  34. Life expectancy data all countries www.coursera.org – Statistics: Making Sense of Data

  35. Life expectancy data half larger 73.2 half smaller Egypt 1 99 197 www.coursera.org – Statistics: Making Sense of Data

  36. Life expectancy data Maximum= 83.4 Median= 73.2 Minimum = 47.8 www.coursera.org – Statistics: Making Sense of Data

  37. Q1 1st quartile = 64.7 Sao Tomé & Príncipe 50 (¼ way) 1 197 www.coursera.org – Statistics: Making Sense of Data

  38. Q1 1st quartile = 64.7 ¼ smaller ¾ larger www.coursera.org – Statistics: Making Sense of Data

  39. Q3 3rd quartile = 76.7 Netherland Antilles 148 (¾ way) 1 197 www.coursera.org – Statistics: Making Sense of Data

  40. Q3 3rd quartile = 76.7 ¾ smaller ¼ larger www.coursera.org – Statistics: Making Sense of Data

  41. Life expectancy data Maximum= 83.4 3rd quartile = 76.7 Median= 73.2 1st quartile = 64.7 Minimum = 47.8 www.coursera.org – Statistics: Making Sense of Data

  42. Box Plot www.coursera.org – Statistics: Making Sense of Data

  43. Box plot maximum 3rd quartile median 1st quartile minimum

  44. Modified box plot outliers 1.5 x IQR IQR interquartile range outliers

  45. Quartiles, median – how to do it? Find min, max, median, Q1, Q3 in these data. Then, draw the box plot. 79, 68, 88, 69, 90, 74, 87, 93, 76 www.coursera.org – Statistics: Making Sense of Data

  46. Another example Min. 1st Qu. Median 3rd Qu. Max. 68.00 75.00 81.00 88.50 93.00 78, 93, 68, 84, 90, 74

  47. Percentiles věk [roky] http://www.rustovyhormon.cz/on-line-rustove-grafy

  48. 3rd M – Mean • Mathematical notation: • … Greek letter capital sigma • means SUM in mathematics • Another measure of the center of the data: mean (average) • Data values:

  49. Robust statistic Salary of 25 players of the American football (NY red Bulls) in 2012. median = 112 495 mean = 518 311 Mean is not arobuststatistic. Median is a robust statistic.

More Related