130 likes | 146 Views
Learn about re-randomisation test procedures through a classroom experiment, statistical analysis, and simulated simulations to prove causation. Understand how to determine if differences are significant or due to chance.
E N D
THE RE RANDOMISATION TEST To illustrate the re randomisation procedures we will use an experiment done by a 13MT3 class in previous years. Students were interested to see if 15mL of sugar (from a medicine measuring cup) would reduce reaction time. The class was randomly placed into either the treatment (with sugar) group or the control (without sugar) group. This was done to make the units “as similar as we could have them”. Consideration was also given to balancing behaviors on the day of the test for the units and measurement procedures were streamlined. Once again, we wanted the units “as similar as we could have them”. Every effort was made in the planning phase of the experiment that the only difference between the two groups was the treatment.
This was the output generated. At the top are dot plots, one for the treatment group and one for the control group. The “with sugar” dotplot is in blue and it represents the students’ reaction distances in cm (reaction speed was measured by placing a ruler so the zero of the ruler lines up with the students fingers, then dropped. At this point the student closes their fingers as soon as possible. Reaction distance is the number of cm the ruler falls before the ruler has been caught) The reaction distances of the control (without sugar) group are in red. As you can see, more students were placed in this group in the real life experiment, which is fine If the median is calculated for each group, the difference between these medians in our real-life experiment is found to be 4.65 cm.
Statisticians ask themselves: So what? is that (4.65 cm) a lot or a little? Is that big enough to say the difference is statistically significant? or is that simply due to chance? I mean, we knew right from the start we wouldn’t get exactly the same mean, there was always going to be a difference, but is 4.65 a ‘significant’ difference? We can’t just say: 4.65 “sounds” like a lot or a little! How do we deal with working out whether, in light of the variation involved, our difference is: significant (4.65 is a lot, that’s a ‘big’ difference, too big to be there by luck) OR insignificant (4.65 is only a small difference, I expected that difference whether the treatment worked or not)
. To establish causation in our experiment, we use the re-randomisation test. We start with a bizarre hypothesis: That the treatment had no effect. As far as my measurements of my response variable go, it doesn’t make any difference whether I administer the treatment or not. So, since the treatment had no effect on reaction time (or so is hypothesized), both groups are the same. Remember, my treatment group and control group have the same characteristics, and I’ve taken the same measurement from them, keeping all procedures in this process as similar as possible. The whole aim of my experimental design is to keep the groups similar in every way except the treatment. that is the only difference between the groups doing my experiment. Now that EVEN the treatment is assumed as making no difference…it can be said that ALL the reaction times found in the whole experiment have been found under exactly the same conditions Okay….well…then It should be fine for me to: mix them all up. I can put a few ‘with sugar’ reaction distances in the ‘without sugar’ group AND VICE VERSA! THE RE RANDOMISATION TEST
Here the two groups are re-randomised(completely jumbled up). You can see it happening on the right Some remain in the same group, some have been re-allocated into the opposite group
The resulting visual you see is a ‘simulated experiment’ Simulating is the proccess of mimicking a real life situation. (it never really happened in real life, the computer is just simulating it happening) Right now, we are mimicking a real life situation, and that is the situation that the treatment had NO EFFCET, the simulation assumes that the measurements taken from the treatment group is the same (under the same conditions) as the measurements taken from the control group In the simulated experiment you see IT IS IMPOSSIBLE FOR THE TREATMENT TO HAVE HAD AN EFFECT. There is a difference in medians, but that difference exists IN A WORLD WHERE THE TREATMENT HAD NO EFFECT
Now we are repeating the process 20 times. (doing 20 experiments, phew, thank god its only a simulation!) Sometimes we get differences less than 4.65 cm. Whenever this happens our simulation (which assumes that the treatment had no effect) is failing to achieve a difference in scores as big as the real life experiment The statistician is encouraged by this. When she simulates the treatment having no effect, she can get a difference, but not as much the one she found in real life. This would give her hope that her real life finding is ‘big’ , too big to exist under the assumption that the treatment had no effect!
But sometimes she is getting a number bigger than 4.65. You can see that out of the 20, one of the differences was 5 cm. This makes her contemplate. If, the simulation can get a number as big as 5, perhaps 4.65 isn’t big? Perhaps the treatment had no effect…after all we can get a number greater than 4.65 in our simulation, and our simulation REPLICATES EXACTLY THAT SITUATION (THE TREATMENT HAVING NO EFFECT)
Well, we could do it twenty more times. Does it seem to be a one-off or fairly common?
Well, statisticians repeat these ‘simulated experiments’ 1000 times. So, we’ve just done the experiment 1000 times and in each of those 1000 the treatment had no effect (simulated it anyway) how many times did we get a difference bigger than the one we found in our real life experiment? Well, you can see that a difference in reaction times of 4.65 cm (or higher) is quite easy to get. Under the condition that ‘the treatment made no difference’, it happened 182 times out of 1000 rerandomisations (18.2% of the time). So the chances of my simulated experiments (rerandomisations) being as good (having a diffference as big) or better than my real life experiment is 18.2%. This is surprisingly high, and there is a good chance that the 4.65 cm difference in reaction time we found in our actual real life experiment was simply due to dumb old luck (statisticians say: “due to chance alone”)
Anything over 10% nullifies the apparent difference we found in out experiment MODEL EXAMPLE: “The re-randomisation distribution shows that chance alone will create a difference in mean scores of 1.98 secondsor more 115 times out of 1000 re-randomisations. Therefore, it could easily be that chance alone is causing the difference. Hence I cannot make the claim that consuming 200ml of an energy drinkcauses a faster reaction time in Y13 students.
If something has less than a 10% chance of happening, we would term that event as unlikely. If the tail proportion is under 10%, it means it is unlikely that the difference is due to chance alone. Chance will cause a difference, but a difference this big is unlikely by chance alone, something else must have contributed to this difference, and that something else must be our treatment (as we have made sure that all other variables have been kept “as similar as we could have them”) C T MODEL EXAMPLE: “The re-randomisation distribution shows that chance alone will create a difference in goals of 1.12 or more only 34 times out of 1000 re-randomisations. Therefore we can be fairly sure that the difference has not been created by chance alone, but something else has contributed, namely, my treatment. Hence I can make the claim that playing a game of basketball causes a player to hit more free throws.