270 likes | 417 Views
Statistick á analýza dat. Daniel Svozil Laboratoř informatiky a chemie, FCHT Vojtěch Spiwok, ÚBM, FPBT. Informace. přednášky – základy, opáčko, cvičení – R http ://ich.vscht.cz/~ svozil/teaching.html Další literatura D. J. Rumsey, Statistics for Dummies, 2011
E N D
Statistická analýza dat Daniel Svozil Laboratoř informatiky a chemie, FCHT Vojtěch Spiwok, ÚBM, FPBT
Informace • přednášky – základy, opáčko, cvičení – R • http://ich.vscht.cz/~svozil/teaching.html • Další literatura • D. J. Rumsey, Statistics for Dummies, 2011 • D. J. Rumsey, Intermediate Statistics for Dummies, 2007 • zápočet, zkouška – bude oznámeno
Valuing houses How much money should you expect to pay for 1 300 ft2 house? 104 000 $ Same question now with 1 800 ft2 ? 144 000 $
Valuing houses How much money should you expect to pay for 2 100 ft2 house? 168 000 $ 21 is just a half between 18 and 24. Same question now with 1 500 ft2 ? 120 000 $
What a statistician does? • Look at data • Program computers • Run statistics software • Drink beer
Linear relationship Is there a fixed amount per square foot? No What if I change 1 400 to 1 300? What is the answer now? Yes
Scatter plots • Please, take a pen and a paper and draw a scatter plot of these data. PRICE SIZE
Scatter plots Do we believe there is a fixed price per square foot? No
Scatter plots What do you think, is the data linear? Let’s make a scatter plot. Surprisingly, the data is linear, even if there is no fixed price per square foot! PRICE = ???? x SIZE + ???? PRICE = 30 x SIZE + 2 000
Scatter plots Draw scatterplot and tell me if these data are linear (i.e., do they lie in a line?). outliers
Bar chart Warm up. Are these data linear? No How much to pay for a 2 200 ft2 house? Just simply interpolate. 105 000 Do you have trust in this number?
Bar chart • Take your data and pull them together.
Bar chart • Much finer representation of the data • Bar chart allows you to understand global trends • Statistician uses cumulative tools (such as bar graph) to gain the understanding of the underlying data. Let me ask you Are bar charts cool?
Histograms • Special case of bar chart. • Bar chart looks at 2D data, histogram to 1D data. That is the main difference.
Age distribution • Draw a histogram at the paper with the bins by 10 years (i.e. 0-10, 11-20, …) 29 27 14 21 12 9 17 14 32 39 3 9 4 33 38 29 21 31 8 15
Věková pyramida věková pyramida (strom života) grafické znázornění věkové struktury obyvatelstva source: http://cs.wikipedia.org/wiki/V%C4%9Bkov%C3%A1_pyramida
Pie charts • koláčový graf • elections • Party A – 50% • Party B – 50% • Party A – 724 000 votes • Party B – 181 000 votes • Party A – 175 000 • Party B – 50 000 • Party C – 25 000 • Party D – 50 000
Gender bias What do you think, is there a gender bias? Who do you think is favored? Male or female?
Gender bias Look at the data independent of major.
Statistics is ambiguous • This example ilustrates how ambiguous the statistics is. • In choosing how to graph your data you may majorily impact what people believe to be the case. “I never believe in statistics I didn’t doctor myself.” Who said that? Winston Churchill