210 likes | 441 Views
measures of centrality. Last lecture summary. Which graphs did we meet? scatter plot ( bodový graf ) bar chart (sloupcový graf) histogram pie chart (koláčový graf) How do they work, what are their advantages and/or disadvantages?. Random noise. Histogram.
E N D
Last lecture summary • Which graphs did we meet? • scatter plot (bodový graf) • bar chart (sloupcový graf) • histogram • pie chart (koláčový graf) • How do they work, what are their advantages and/or disadvantages?
Histogram • Now I will collect heights of all of you in this room. • Use Interactive Histogram Applet: http://www.shodor.org/interactivate/activities/Histogram/ • interval, bin
Histogram – Body fat • In Interactive Histogram Applet – choose „Body fat % in 252 men“ dataset. • Find reasonable bin size • Answer following questions. No matter of bin size what is always true? • Most scores fall around 20%. • The shape is roughly symmetrical. • Most scores fall in the middle of distribution. • There are more scores between 15 and 25 than between 35 and 50. • There are more scores between 0 and 10 than between 18 and 24. • Relatively more men have a body fat above 35% or below 5%.
Histogram – Income distribution • United States Census Bureau – http://www.census.gov
Histogram – Income distribution • This is an example of a (positively) skewed distribution (zprava zešikmené rozdělení). • This distribution is not symmetrical. • Most incomes fall to the left of the distribution.
Bar chart and scatter plot • Which scatter plot corresponds to this bar chart?
Pie chart to histogram • Which histogram looks like it cames from the same data?
About statistics • Statistics – the science of collecting, organizing, summaryzing, analyzing, and interpreting data • Goal – use imperfect information (our data) to infer facts, make predictions, and make decisions • Descriptive statistic – summarising data with numbers or pictures • Inferential statistics – making conclusions or decisions based on data
Choosing a profession Chemistry Geography 50 000 – 60 000 40 000 – 55 000
Choosing a profession • We made an interval estimate. • But ideally we want one number that describes the entire dataset. This allows us to quickly summarize all our data.
Choosing a profession • The value at which frequency is highest. • The value where frequency is lowest. • Value in the middle. • Biggest value o x-axis. • Mean Geography Chemistry
Three big M’s • The value at which frequency is highest is called the mode. i.e. the most common value is the mode. • The value in the middle of the distribution is called the median. • The mean is the mean. Geography Chemistry
Quick quiz • What is the mode in our data?
More of mode True or False? • The mode can be used to describe any type of data we have, whether it’s numerical or categorical. • All scores in the dataset affect the mode. • If we take a lot of samples from the same population, the mode will be the same in each sample. • There is an equation for the mode. • Ad 3. • http://onlinestatbook.com/stat_sim/sampling_dist/ • Mode changes as you change a bin size. • The mode depends on how you present data. And we can’t use mode to learn something about our population.
Life expectancy data • Watch TED talk by Hans Rosling, Gapminder Foundation: http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html