Normality or not? Different distributions and their importance

Normality or not? Different distributions and their importance Stats Club 3 Marnie Brennan

References • Petrie and Sabin - Medical Statistics at a Glance: Chapter 7, 8, 9, 35 Good • Petrie and Watson - Statistics for Veterinary and Animal Science: Chapter 3 Good • Thrusfield – Veterinary Epidemiology: Chapter 12 • Kirkwood and Sterne – Essential Medical Statistics

What is a distribution? • Empirical frequency distribution versus theoretical distribution • Very easy! • Empirical frequency distribution is something that you actually measure and calculate • E.g. Coat colour in cats – Tabby, Ginger, Tortoiseshell, Seal-point • In a population, each one of these has a frequency e.g. 5 x Tabby, 9 x Ginger, 15 x Tortoiseshell, 8 x Seal-point

Theoretical distributions • Theoretical distribution – is just that – theoretical! • It is something we measure our data (empirical frequency distribution) against to see which distribution describes it the best • This helps to signpost us to what statistical analyses we do next, according to the distribution it ‘approximates’

Theoretical distributions and types of data • Back to our flow charts in the back of Petrie and Sabin, and Petrie and Watson • Relates to what type of variable you have • Continuous? E.g. Heights of Japanese men • Categorical or discrete? E.g. Coat colour in cats

Continuous distributions • Normal distribution – The grandaddy of them all! • Also known as the Gaussian distribution (after Gauss, German mathematician) • e.g. heights of adult men in the UK • T-distribution • Similar shape to Normal, but is more spread out with longer tails • Useful for calculating confidence intervals • Chi-squared distribution • Right-skewed distribution • Useful for analysing categorical data • F-distribution • Skewed to the right • Useful for comparing variances and more than 2 means (i.e. > 2 groups) • DO NOT BE SCARED – THIS IS ANOTHER EXERCISE IN TERMINOLOGY!! Our focus today

Discrete distributions • Binomial distribution • Could be skewed to the right or left (!) • Good for analysing proportion data – i.e. it is either one thing, or another, such as an animal either has a disease or does not have a disease • Poisson distribution • Right skewed • Good for analysing count data – i.e. the number of hospital admissions per day, the number of parasitic eggs per gram of faecal sample • Many of these distributions approximate normal when your sample size increases • A lot of this goes on behind your computer when doing statistics; it is here to help explain some of the terminology and basic ideas only (don’t worry too much about it!)

The useful bit....... • You have collected continuous data from your research e.g. length in millimetres of the diameter of rabbit skulls • You would like to find out if this is normally distributed or not (as you know that this will affect what statistical tests you do) • How do you measure whether this variable is normal or not?

4 steps to Normality! • Plot your data • Create a histogram with frequencies and determine by eye • Does it look bell-shaped and symmetrical? • Does it look unimodal i.e. does it only have one peak? • Subjective measurement, but you should be doing this anyway!

4 steps (continued) • How different are the mean and median? • Mean = Total of your data added up/total no. of measurements • Median = The midpoint of your values i.e. what is the ‘halfway’ value in your data? • If they are very different, the data is probably not normally distributed • If they are very similar, your data could be normally distributed • Another rule of thumb, so not always correct

4 steps (continued) • Skewness and kurtosis • Skewness (how symmetrical the data is) • Normal – this value is 0 • Right-skewed distribution – positive value • Left-skewed distribution – negative value • Kurtosis (the ‘peakedness’ of the data) – does your data have a pointy bit, or is it flat? • Normal – this value is 0 • Sharply peaked data – positive value • Flat peaked data – negative value • Can measure these in Minitab or SPSS

4 steps (continued) • Bespoke tests for normality • Shapiro-Wilk test (Ryan-Joiner test) • Kolmogorov-Smirnov test • Anderson-Darling test • Watch interpretation of p-values – if it is <0.05, it is not normal (reject null hypothesis of normality) • The good news! • Computers do this for us so we don’t have to!

Next month • Spread of your data – how do we measure this? • mean, standard deviation, variance • median, interquartile range • mode

Normality or not? Different distributions and their importance

Normality or not? Different distributions and their importance

Presentation Transcript

CHAPTER 3

Biostatistics

Rome: Importance

Chapter 6

Joint Probability Distributions

Chapter 9

Chapter 15 Corporate Nonliquidating Distributions

Strategy for Complete Discriminant Analysis

Elliptical Distributions

Inference for Distributions - for the Mean of a Population

The Hilbert Book Mode l

Chapter 2

Lecture Slides

The World Distribution of Income (from Log-Normal Country Distributions)

Buchwald-Hartwig Coupling : Discovery, Optimization, and Applications

Last Time

The importance of organic chemistry

More “normal” than Normal: Scaling distributions in complex systems

Lecture Slides

Chapter 2

Brief Review

Continuous Random Variables and Probability Distributions