1 / 47

Biostatistics

Biostatistics. Statistics. Sayings about statistics: Statistics is a science about accurate work with inaccurate numbers. We know three kinds of lies: intentional, unintentional and statistics. Biostatistics – what does it mean ?.

rblackburn
Download Presentation

Biostatistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biostatistics

  2. Statistics • Sayings about statistics: • Statistics is a science about accuratework with inaccurate numbers. • We know three kinds of lies: intentional, unintentional and statistics

  3. Biostatistics – what does it mean? • It isn’t separate field of science. Using this word we point out, that it is an application of statistical methods helping to resolve biological problems. [and biological data are specific of their own]

  4. And what is statistics indeed? • (in laymen language) Ordered group of data: statistics of shootings, statistics of car accidents in different regions • (in scientific language) A science, what we are going to do with our data - (mathematical) statistics as a science • Withing the scope of statistics – a value calculated from numbers and “synthesizing” features of these numbers

  5. “Anything can be proved with the help of statistics” • …especially by people, who don’t understand statistics • “It is statistically proved, that widows live longer than their husbands.” • It is possible to put anything to diagrams and they look then very suggestive, especially when they are accompanied with “right” interpretation (data are fictitious, but according to reality)

  6. And much better with the help of diagrams:

  7. Advice: when somebody tells you, how many percents something got better, ask every time, which base were the percents computed from.

  8. Goals of statistics • (1) Descriptive statistics – to sumarize data, to “condensate” information from many numbers to lesser number of parameters or to a diagram

  9. Compare Average number of points was 74.5, whereas the minimum value was 28 and the maximum value was 100. Frequency diagram No. of points

  10. The lower number of parameters I obtain • the more transparent and more simple the result is • the loss of information is bigger though (I am never able to find out from average or histogram how much points had František K., nor the value of all the numbers) • - the art is to find the border, where the result is transparent but still having its predictive quality

  11. Thanks to the loss of information we are able to say lies in statistics According to the statistics, we all are flying. Not so high in the clouds, but near the ground and just slightly touching with the end of our shoes the shit we are sitting in.

  12. “The worst the patient is, the better the medicine works.”

  13. Argument for harmfulness of fluoridization (data from USA’s states) Nicaragua should be here

  14. “Storks bring babies”

  15. Differentiate - correlation and causation • The general scientific method

  16. Common scientific method – on the example of babies bringing storks: 1. Observation – finding of pattern

  17. 2. Interpretation – “Stork brings babies” • 3. Prediction – if we remove storks, babies won’t be born [or their number would be decreased, if crows also do the job] • 4. Experiment: In the half of regions (randomly selected!) we shoot out storks and watch changes in natality (in comparison with the changes in control regions) • 5. (After statistical approach) we bring out there are no changes, so we can proclaim, that storks don’t bring babies.

  18. Hypothetical-deductive approach (K. Popper) – good presumption can bring just good prediction, bad presumption can bring both good and bad prediction – thanks to this we can never prove the prediction (hypothesis), just reject it Observation (“pattern”) explanation Hypothesis exclude each other, predictions differ from each other Hypothesis 1 Hypothesis 2 Hypothesis 3 Prediction 2 Prediction 3 Prediction 1 Result of the experiment compared with the reality

  19. Goals of statisticsPopulation and sample • (2) Interferential statistics - Making an inference about (statistical) population from a sample • Some (statistical) populations are too large [or potentially infinite] – I am not able to check all the members • What can I say about results of elections in the whole republic, when I ask just 1000 people? • What can I say about amount of Cd in blood of wild geese in CZ, when I took blood just from 10 specimens?

  20. Interferential statistic is common in biology • I don’t want to make conclusions about my 10 laboratory rats, but on the base of these 10 rats I want to say something about all experiments done in the same way • Should this be a science, the experiments have to be reproducible (comp. Journal of Irreproducible Research)

  21. Types of (not only biological) data • Continuous and discrete data – mathematical definition and reality of data´s measuring – in reality we always measure data with certain accuracy

  22. Types of (not only biological) data • Ratio scale • Interval scale • Ordinal scale • Nominal scale (categorical data) 0 Circular scale 270 90 180

  23. Azimuth of the stem with lichen findings [degrees]: 5, 10, 5, 350, 350, 355 => average = 180 Time of doom-monger´s ululating: 22:00, 23:00, 24:00, 1:00, 1:00, 2:00 => average is short after the midday

  24. Types of (not only biological) data • Ratio scale • Interval scale • Ordinal scale • Nominal scale (categorical data) 0 Circular scale 270 90 180

  25. Populationand Random sample • Sampling; Sampling design • Random sample – every individual has to have the same probability to bechosen, independent upon the fact that another individual was chosen • Tabs and generators of (pseudo)random numbers

  26. Population sample and Random sample • Almost philosophical question – what it is“random” • And what it is probability • In statistics (that means in this course) we will use so-called a priori probability (also the Bayesian - posterior probability exists)

  27. To make a random sampling isn’t usually trivial – in no case it is a sampling of typical individuals – it works reasonably well in agricultural experiments 1 2 3 1 2 3 4 5 6

  28. Much more difficult it is in natural populations – even individual nearest to the random point does not work here

  29. Basic statistical characteristics • We usually mark N – size of the population, n – size of sample • Characteristics of the population are usually marked with Greek alphabet and characteristics of sample with Roman characters • Characteristics of location: • Means, median and modus • Means are defined for quantitative data (i.e. on ratio and interval scale)

  30. Arithmetical mean of population of sample

  31. Geometrical mean • n-root of the sum of n values (for a sample here)

  32. Harmonic mean • Reciprocal of the mean of reciprocals.

  33. Median [used for ordinal-scaled data also] • It is defined as one half of the values is under and the second one over the median (in endless populations is the probability, that random value is over as well as under the median 0.5). In populations with even number of terms is a value in the half of two middle values considered to bethe median

  34. Upper and lower quartile • Over the upper quartile is 1/4 observations, under the lower one is 1/4 of observations (similar with the endless populations)

  35. Make difference among meaning of mean and median Example – wages in two companies

  36. Modus – the most common value in continuous data – in continuous data it is the “peak” in frequency diagram – we will define it as the local maximum of the density-probabilities’ curve later [can be more than one]

  37. mean median median mean mean mean median median

  38. Characteristics of variability • 1. Range is a difference between minimum and maximum • 2. Interquartile range • 3. Variance and standard deviation

  39. Variance – average value of square deviation from mean • population - estimation based on the sample n-1 = df = degrees of freedom

  40. Standard deviation (sx, often also “s.d.” or “S.D.”) is root from variance

  41. Compare variability in weight of elephant and ant • Use either variance or standard deviation of data under logarithm, or coefficient of variation CV • Both have its sense just for ratio-scaled data

  42. Standard error of mean • Characteristic of sample mean’s accuracy – how big would be variability of means of this size from many random samples variability in data accuracy We can higher accuracy thanks to larger sample.

  43. Graphic summarizations – frequency diagram NO_SAPLING

  44. Box and whisker plot Attention, nowadays is box & whisker also used for mean and standard deviation etc. NO_SAPLING

More Related