260 likes | 379 Views
Non-parametric data analysis. Dr Peter Baghurst Public Health Research Unit. Recall the logic of hypothesis testing. Assume that the treatment is ineffective ………..the so called Null Hypothesis
E N D
Non-parametric data analysis Dr Peter Baghurst Public Health Research Unit
Recall the logic of hypothesis testing • Assume that the treatment is ineffective ………..the so called Null Hypothesis • The assumption of a Null Hypothesis allows us to calculate probabilities of outcomes of the experiment • If the actual result is highly improbable under this Null Hypothesis we reject the assumption that the treatment was ineffective
A closer look at that second step • Assume that the treatment is ineffective ………..the so called Null Hypothesis • The assumption of a Null Hypothesis allows us to calculate probabilities of outcomes of the experiment • If the actual result is highly improbable under this Null Hypothesis we reject the assumption that the treatment was ineffective
In order to calculate the probabilities of those outcomes we need to know the probability distribution for our observations
Some distributions A Likert scale Exponential Normal or Gaussian Gamma
So in the ideal world of the two-sample hypothesis test…. • We collect data • We examine the distribution of our observations • We choose a test statistic appropriate for that specific distribution • We calculate the test statistic – and refer to its distribution under the hypothesis that there is no difference between the two samples • If there is very little “likelihood” of that value of the test statistic arising under the hypothesis of “no difference” we conclude our hypothesis is untenable
Problem….. • If we only have 20 – 50 observations, that’s not many to check a distribution! • If it’s obvious that our data are not distributed according to a ‘text-book’ distribution, what do we do?
Solutions • If we have insufficient points, we can often rely on bigger, published studies which measured the same variable and examined its distribution • We can imply ignore the problem (very popular!) and push the data through a ‘black-box’, which assumes the observations ARE normally distributed… (the blind faith approach) • Resort to a “non-parametric” approach • ‘Transform’ the measurements so that the transformed data follow a known distribution (eg sometimes log(Y) might be adequately described by a Normal distribution, even when Y itself clearly was not) – a topic discussed last time
The fussing babies problem A new treatment has been developed to reduce the level of ‘fussing’ in babies following a feed. In a simple first evaluation, the time each baby ‘fusses’ before and after application of the treatment is recorded. The obvious question is whether the babies fuss for a shorter time after the intervention than before.
The parametric approach If we could be confident about the form of the distribution of ‘fussing’ times, we might average the time of fussing before the intervention, and compare this with the average fussing time after the intervention. IF the crying times were, say, Normally distributed this difference would follow a t-distribution with an expected value of zero if the intervention was totally ineffective. If our mean difference (appropriately scaled as a t-statistic) was a “long way” from zero we might conclude that the assumption the treatment was ineffective was unreasonable
Back to the fussing babies problem Suppose we had doubts about the distribution of fussing times (say most babies fussed for 30 to 60 minutes, but one fussed for 155 minutes – and we had no reason to think there was some other organic cause for the fussing). We might choose instead just to summarise the fussing times and classify each baby simply according to whether its fussing increased or decreased.
If the treatment is effective we expect more babies to respond than not. If it is ineffective we might expect half to respond and half not to respond (ie a response probability of 0.5) a probability of 0.055 of observing 2 or less if the treatment is ineffective a probability of 0.055 of observing 8 or more
In general… Instead of analysing the actual ‘before’ and ‘after’ Values we simply look at whether the measurement Increased after the intervention (+), or decreased (-). The principle is the same… If the treatment is ineffective we would expect approximately equal numbers of (+)s and (-)s, ie prob(+) = prob(-) = ½. Look for this one in textbooks and computer packages as the ‘sign test’.
intervention Eligible babies placebo control Fussing babies – a slightly different design Suppose that instead of a “before-and-after” design we ran a small randomised clinical trial As before we measure time spent fussing following a feed – and we are concerned about the distribution of our measurements
╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ Using ranks instead of actual values 33 41 43 47 52 60 66 73 82 90 155 ☓ ☓ ☓ ☓ ☓ ☓ ☓ ☓ ☓ ☓ ☓ ☓ 155 33 1 2 3 4 5 6 7 8 9 10 11 12 lowest highest
╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ Fussing babies – analysis of ranks We proceed by ‘pooling’ all the measurements from the control and intervention arms and assigning them a ‘rank’ (from smallest to largest) actual values 12 13 17 35 36 42 53 60 93 94 147 155 1 2 3 4 5 6 7 8 9 10 11 12 pooled ranks
╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳ Now restore group identity….. Situation 1: Observations from two treatment groups appear to be in random order when they are ordered or ‘ranked’ Situation 2: Observations from one treatment group appear to be generally toward the top end of the ordered set of observations from both groups
Building a test… If the treatment is truly ineffective, the ranks of the treatment group will simply be a random sample of the numbers 1,2,3,4,5,….., 2n. (where n is the number of subjects in each group) A Wilcoxon rank sum test looks at the average or ‘mean’ rank for each group and determines the probability of their departure from the expected mean value of the ranks 1…. 2n, viz, (2n+1)/2.
When you wish to compare more than two groups A generalisation which works on the same principle can be found under the term Kruskal-Wallis test The principle is the same:- If we have 3 groups with n subjects in each group, The average rank in each group should be (3n+1)/2 If the average rank in any group is very different’ from this expected value, we reject the null hypothesis
The Pearson correlation coefficient • An extremely popular measure of the degree of association between two measured variables (±1 indicates perfect association, 0, no association) • It assumes that the two variables are both Normally distributed • In small samples one very large observation can give misleading estimates (ie it is not robust)
Interpreting a Pearson correlation Both sets have an estimated r close to 0.25 One slope is 4 times the other The size of the correlation coefficient depends on how well the points fit the line – AND the slope
Pearson correlation coefficient is a measure of linear association r = 0.001; – but clearly it would be inappropriate to conclude there is no relationship between these variables!
Other measures of association • Spearman’s rank correlation – it simply applies the Pearson correlation coefficient to the ranks rather than the raw observations • Kendall’s S – records perfect agreement if the ranks of one variable are all less than the ranks of the second variable
Pearson and a rank correlation coefficient x x x x x x x variable 2 x x Points clearly do not lie perfectly on a straight line – but every time variable 1 increases, variable 2 increases. Pearson r is less than 1 Spearman’s r &Kendall S are both 1. x x x x x variable 1
Ties It is frequent for two or more observations to have the same value. This leads to ‘tied ranks’ Most computer implementations of the tests mentioned take these into account, but this will not be discussed today