120 likes | 271 Views
Data Handling & Analysis BD7054 Normality. Andrew Jackson a.jackson@tcd.ie. Making assumptions. Each group is normally distributed. The residuals off the line are normally distributed. Distributions are where numbers come from.
E N D
Data Handling & AnalysisBD7054Normality Andrew Jackson a.jackson@tcd.ie
Making assumptions Each group is normally distributed The residuals off the line are normally distributed
Distributions are where numbers come from • The binomial distribution tells us how systems like a coin toss behave • It tells us how many events are likely to occur given repeated attempts • The event has a fixed probability of occurring each time
The normal distribution • Normal or Gaussian distribution • “the bell shaped curve” • Defined by mean and a variance (or standard deviation) • The PDF or Probability Density Function of the normal distribution is shown right
Origins of the Normal Distribution • Assume that an individual’s weight or height (or whatever we are measuring) is affected by thousands of small +/- effects such as genes or environment • Add those effects up for each individual, and lo and behold… • The character will display a normal distribution
Return to our brain/body data • We need to test whether each group is normally distributed • Equivalent to asking if the residuals are normally distributed • Residuals are the difference between an observed value and its predicted value • Which is the mean value in each group in this case
Exploring Residuals from boxplots A simple histogram A Q-Q plot (quantile-quantile)
Return to our scatter plot • We need to test whether our residuals off the line are normally distributed • Also need to check that there is no trend in the deviation of the residuals along the line
Exploring residuals from scatter plot Histogram of residuals Q-Q plot of residuals
What to do if residuals are not normal? • Transforming the data is often the solution • Taking the log of the response variable (y) is first port of call • For scatter plot type data, can also take the log of the explanatory (x) variable • We will do this next time we meet