260 likes | 390 Views
Statistick á analýza dat. Daniel Svozil Laboratoř informatiky a chemie, FCHT Vojtěch Spiwok, ÚBM, FPBT. Informace. přednášky – základy, opáčko, cvičení – R http ://ich.vscht.cz/~ svozil/teaching.html Další literatura D. J. Rumsey, Statistics for Dummies, 2011
E N D
Statistická analýza dat Daniel Svozil Laboratoř informatiky a chemie, FCHT Vojtěch Spiwok, ÚBM, FPBT
Informace • přednášky – základy, opáčko, cvičení – R • http://ich.vscht.cz/~svozil/teaching.html • Další literatura • D. J. Rumsey, Statistics for Dummies, 2011 • D. J. Rumsey, Intermediate Statistics for Dummies, 2007 • zápočet, zkouška – bude oznámeno
Valuing houses How much money should you expect to pay for 1 300 ft2 house? 104 000 $ Same question now with 1 800 ft2 ? 144 000 $
Valuing houses How much money should you expect to pay for 2 100 ft2 house? 168 000 $ 21 is just a half between 18 and 24.
What a statistician does? • Look at data • Program computers • Run statistics software • Drink beer
Linear relationship Is there a fixed amount per square foot? No What if I change 1 400 to 1 300? What is the answer now? Yes
Scatter plots (bodový graf) • Please, take a pen and a paper and draw a scatter plot of these data. PRICE SIZE
Scatter plots Is there a fixed price per square foot? No
Scatter plots What do you think, is the data linear? Let’s make a scatter plot. Surprisingly, the data is linear, even if there is no fixed price per square foot! PRICE = AA x SIZE + BB PRICE = 30 x SIZE + 2 000
Scatter plots Draw scatterplot and tell me if these data are linear (i.e., do they lie in a line?). outliers
Bar chart Warm up. Are these data linear? No How much to pay for a 2 200 ft2 house? Just simply interpolate. 105 000 Do you have trust in this number?
Bar chart • Take your data and pull them together.
Bar chart • Much finer representation of the data • Bar chart allows you to understand global trends • Statistician uses cumulative tools (such as bar graph) to gain the understanding of the underlying data.
Histograms • Special case of bar chart. • Bar chart looks at 2D data, histogram to 1D data. That is the main difference.
Age distribution • Draw a histogram at the paper with the bins by 10 years (i.e. 0-10, 11-20, …) 29 27 14 21 12 9 17 14 32 39 3 9 4 33 38 29 21 31 8 15
Věková pyramida věková pyramida (strom života) grafické znázornění věkové struktury obyvatelstva source: http://cs.wikipedia.org/wiki/V%C4%9Bkov%C3%A1_pyramida
Histogram • Now I will collect heights of all of you in this room. • Use Interactive Histogram Applet: http://www.shodor.org/interactivate/activities/Histogram/ • interval, bin
Histogram – Body fat • In Interactive Histogram Applet – choose „Body fat % in 252 men“ dataset. • Find reasonable bin size • Answer following questions. No matter of bin size what is always true? • Most scores fall around 20%. • The shape is roughly symmetrical. • Most scores fall in the middle of distribution. • There are more scores between 15 and 25 than between 35 and 50. • There are more scores between 0 and 10 than between 18 and 24. • Relatively more men have a body fat above 35% or below 5%.
Histogram – Income distribution • United States Census Bureau – http://www.census.gov
Histogram – Income distribution • This is an example of a (positively) skewed distribution (zprava zešikmené rozdělení). • This distribution is not symmetrical. • Most incomes fall to the left of the distribution.
Pie charts • koláčový graf • elections • Party A – 50% • Party B – 50% • Party A – 724 000 votes • Party B – 181 000 votes
A B C D E
Pie charts • Party A – 175 000 • Party B – 50 000 • Party C – 25 000 • Party D – 50 000 • Please, draw the pie chart A: 4/12 B: 2/12 C: 1/12 D: 2/12
Bar chart and scatter plot • Which scatter plot corresponds to this bar chart?
Pie chart to histogram • Which histogram looks like it cames from the same data?