520 likes | 644 Views
Biostatistics. Statistics. Sayings about statistics: Statistics is a science about accurate work with inaccurate numbers. We know three kinds of lies: intentional, unintentional and statistics. Biostatistics – what does it mean ?.
E N D
Statistics • Sayings about statistics: • Statistics is a science about accuratework with inaccurate numbers. • We know three kinds of lies: intentional, unintentional and statistics
Biostatistics – what does it mean? • It isn’t separate field of science. Using this word we point out, that it is an application of statistical methods helping to resolve biological problems. [and biological data are specific of their own]
And what is statistics indeed? • (in laymen language) Ordered group of data: statistics of shootings, statistics of car accidents in different regions • (in scientific language) A science, what we are going to do with our data - (mathematical) statistics as a science • Withing the scope of statistics – a value calculated from numbers and “synthesizing” features of these numbers
“Anything can be proved with the help of statistics” • …especially by people, who don’t understand statistics • “It is statistically proved, that widows live longer than their husbands.” • It is possible to put anything to diagrams and they look then very suggestive, especially when they are accompanied with “right” interpretation (data are fictitious, but according to reality)
Advice: when somebody tells you, how many percents something got better, ask every time, which base were the percents computed from.
Goals of statistics • (1) Descriptive statistics – to sumarize data, to “condensate” information from many numbers to lesser number of parameters or to a diagram
Compare Average number of points was 74.5, whereas the minimum value was 28 and the maximum value was 100. Frequency diagram No. of points
The lower number of parameters I obtain • the more transparent and more simple the result is • the loss of information is bigger though (I am never able to find out from average or histogram how much points had František K., nor the value of all the numbers) • - the art is to find the border, where the result is transparent but still having its predictive quality
Thanks to the loss of information we are able to say lies in statistics According to the statistics, we all are flying. Not so high in the clouds, but near the ground and just slightly touching with the end of our shoes the shit we are sitting in.
Argument for harmfulness of fluoridization (data from USA’s states) Nicaragua should be here
Differentiate - correlation and causation • The general scientific method
Common scientific method – on the example of babies bringing storks: 1. Observation – finding of pattern
2. Interpretation – “Stork brings babies” • 3. Prediction – if we remove storks, babies won’t be born [or their number would be decreased, if crows also do the job] • 4. Experiment: In the half of regions (randomly selected!) we shoot out storks and watch changes in natality (in comparison with the changes in control regions) • 5. (After statistical approach) we bring out there are no changes, so we can proclaim, that storks don’t bring babies.
Hypothetical-deductive approach (K. Popper) – good presumption can bring just good prediction, bad presumption can bring both good and bad prediction – thanks to this we can never prove the prediction (hypothesis), just reject it Observation (“pattern”) explanation Hypothesis exclude each other, predictions differ from each other Hypothesis 1 Hypothesis 2 Hypothesis 3 Prediction 2 Prediction 3 Prediction 1 Result of the experiment compared with the reality
Goals of statisticsPopulation and sample • (2) Interferential statistics - Making an inference about (statistical) population from a sample • Some (statistical) populations are too large [or potentially infinite] – I am not able to check all the members • What can I say about results of elections in the whole republic, when I ask just 1000 people? • What can I say about amount of Cd in blood of wild geese in CZ, when I took blood just from 10 specimens?
Interferential statistic is common in biology • I don’t want to make conclusions about my 10 laboratory rats, but on the base of these 10 rats I want to say something about all experiments done in the same way • Should this be a science, the experiments have to be reproducible (comp. Journal of Irreproducible Research)
Types of (not only biological) data • Continuous and discrete data – mathematical definition and reality of data´s measuring – in reality we always measure data with certain accuracy
Types of (not only biological) data • Ratio scale • Interval scale • Ordinal scale • Nominal scale (categorical data) 0 Circular scale 270 90 180
Azimuth of the stem with lichen findings [degrees]: 5, 10, 5, 350, 350, 355 => average = 180 Time of doom-monger´s ululating: 22:00, 23:00, 24:00, 1:00, 1:00, 2:00 => average is short after the midday
Types of (not only biological) data • Ratio scale • Interval scale • Ordinal scale • Nominal scale (categorical data) 0 Circular scale 270 90 180
Populationand Random sample • Sampling; Sampling design • Random sample – every individual has to have the same probability to bechosen, independent upon the fact that another individual was chosen • Tabs and generators of (pseudo)random numbers
Population sample and Random sample • Almost philosophical question – what it is“random” • And what it is probability • In statistics (that means in this course) we will use so-called a priori probability (also the Bayesian - posterior probability exists)
To make a random sampling isn’t usually trivial – in no case it is a sampling of typical individuals – it works reasonably well in agricultural experiments 1 2 3 1 2 3 4 5 6
Much more difficult it is in natural populations – even individual nearest to the random point does not work here
Basic statistical characteristics • We usually mark N – size of the population, n – size of sample • Characteristics of the population are usually marked with Greek alphabet and characteristics of sample with Roman characters • Characteristics of location: • Means, median and modus • Means are defined for quantitative data (i.e. on ratio and interval scale)
Arithmetical mean of population of sample
Geometrical mean • n-root of the sum of n values (for a sample here)
Harmonic mean • Reciprocal of the mean of reciprocals.
Median [used for ordinal-scaled data also] • It is defined as one half of the values is under and the second one over the median (in endless populations is the probability, that random value is over as well as under the median 0.5). In populations with even number of terms is a value in the half of two middle values considered to bethe median
Upper and lower quartile • Over the upper quartile is 1/4 observations, under the lower one is 1/4 of observations (similar with the endless populations)
Make difference among meaning of mean and median Example – wages in two companies
Modus – the most common value in continuous data – in continuous data it is the “peak” in frequency diagram – we will define it as the local maximum of the density-probabilities’ curve later [can be more than one]
mean median median mean mean mean median median
Characteristics of variability • 1. Range is a difference between minimum and maximum • 2. Interquartile range • 3. Variance and standard deviation
Variance – average value of square deviation from mean • population - estimation based on the sample n-1 = df = degrees of freedom
Standard deviation (sx, often also “s.d.” or “S.D.”) is root from variance
Compare variability in weight of elephant and ant • Use either variance or standard deviation of data under logarithm, or coefficient of variation CV • Both have its sense just for ratio-scaled data
Standard error of mean • Characteristic of sample mean’s accuracy – how big would be variability of means of this size from many random samples variability in data accuracy We can higher accuracy thanks to larger sample.
Graphic summarizations – frequency diagram NO_SAPLING
Box and whisker plot Attention, nowadays is box & whisker also used for mean and standard deviation etc. NO_SAPLING