Sample size issues & Trial Quality

Sample size issues & Trial Quality David Torgerson

Chance • When we do a trial we want to be sure that any effect we see is not simply by chance. • The probability of any difference occurring by chance declines as the sample size increases. • Also as the sample size increases smaller differences are likely to be statistically significant.

Statistical significance • A p value of 0.05 means if we repeated the same trial 100 times we would expect to observe the difference that occurred by chance 5 times. • A small p value does not relate to the strength of the association a small difference will have a small p value if the sample size is large.

Power • As well as significance there is the issue of power. • A sample size might have 80% power to detect a specified difference at 5%. In other words for a given sample size we would have an 80% probability of observing an effect if it exists with a 5% significance.

Sample Size • Don’t believe statistics textbooks. A trial can NEVER be too big. Most trials in education use tiny sample sizes (e.g., 30 participants). • A small trial will miss an important difference. For example, if a school based intervention increased exam pass rates by 10% this would have very important benefits to society. BUT we would need a trial of at lease 800 participants in an individually randomised trial to observe this effect.

Sample Size • Small trials will miss important differences. • Bigger is better in trials. • Why was the number chosen? For example “given an incidence of 10% we wanted to have 80% power to show a halving to 5%” or “we enrolled 100 participants”.

What is a reasonable difference? • In a review by Lipsey and Wilson in 1993 of all quasi-experiments in the social sciences they found few effective interventions had effect sizes greater than 0.5. Health care produces similar gains of 0.5 of a standard deviation or lower. Lipsey & Wilson, 1993. American Psychologist 48, 1181-1209

Effect size & Sample size • We should, therefore, plan trials that are large enough to identify a difference of 0.5 of a standard deviation between the two experimental groups if it exists.

Who needs a statistician? • A simple way to calculate a sample size is to take 32 (for 80% power, 42 for 90%) and divide this by the square of effect size. • 0.5 squared is 0.25. 32/0.25 = 128. • 0.25 is 512, note halving the effect size quadruples the sample size.

Cluster sample size • Because often in education we will randomise by classes or schools we need to take the correlation between pupils into account for sample size calculations. This can often lead to a doubling or more of the sample participants.

Reporting Quality of Trials • As well as having an adequate sample size there are other important aspects of trial quality.

Important quality items • Allocation method; • method randomisation; • secure randomisation. • Intention to treat analysis. • Blinding. • Attrition.

Blinding • Who knew who got what when? • Was the participant blind? • Most IMPORTANT was outcome assessment blind?

Attrition • What was the final number of participants compared with the number randomised? • What happened to those lost along the way? • Was there equal attrition?

External Validity • Once a trial has high internal validity our next task is to assess whether its results are applicable outside its sample. • Are participants similar to the general population on who we would apply the intervention? • Was intervention used generalisable?

Methods comparison of trials • We undertook a methodological review of RCTs in health and education to answer the following questions: • Were only bad trials prevalent in health care? • Was methodological quality improving over time? Torgerson et al. BERJ, 2004: Accepted.

Study Characteristics

Change in concealed allocation P = 0.04 P = 0.70 NB No education trial used concealed allocation

Blinded Follow-up P = 0.03 P = 0.54 P = 0.13

Underpowered P = 0.22 P = 0.01 P = 0.76

Mean Change in Items P= 0.03 P= 0.001 P= 0.07

Quality conclusions • Apart from ‘drug’ trials the quality of health care trials is poor and not improving outside of major journals. • Education trials are bad and getting worse!

Trial Examples • CBT treatment vs Fire Safety Education (FSE) for child arsonists. • N = 38 boys randomised to receive FSE or CBT. • Outcomes measured at 13 weeks and 12 months included firesetting and match play behaviour Kolko J Child Psychiatr 2001:42:359.

Results • Outcomes were mixed in some outcomes were favourable for CBT, whilst for others, there was no difference.

Results

Problems with Trial • Too SMALL. • Trial could have missed very important differences. • Outcomes were NOT arson, there were no reports of arson by any children in the study. • Unsure whether randomisation was concealed.

Domestic violence experiment • 404 men convicted of partner abuse were randomised to probation or counselling. • Data were collected at 12 months on re-arrests, beliefs and behaviours on partner abuse. Feder & Dugan 2002; Justice Quarterly 19;343.

Results • No difference in re-offending as measured by re-arrest statistics (I.,e 24% in both groups). • No differences in attitudes towards partner abuse.

Trial Methods • Trial was relatively large (> 200 in each group) would have had enough power to detect a halving of offending. That is 24% down to 12%. • For ‘beliefs’ there was a high drop-out rate, (50%) which may make those results unreliable. • Allocation appeared to be secure. • Cross-over was slight. • Unclear as to whether re-arrest data was collected ‘blindly’.

Conclusion • Counselling is probably an ineffective method of trying to prevent spousal abuse. • Other interventions should be sought. • Message: if you’re being battered by your Spouse, don’t bother with counselling!

Preventing unscheduled school transfers • Unscheduled school transfers are associated with poor academic outcomes. The raising healthy children project in Seattle aimed to put into place interventions among high risk students who exhibit academic or behavioural problems. Fleming et al, 2001: Evaluation Review 25:655.

Design • Cluster randomised trial • 10 schools randomised. 5 experimental schools received a variety of interventions to help high risk students and their families. • Analysis was multilevel to take into account clustering.

Results • The intervention showed that there was a reduction of 2/3rds in the transfer rates, which was statistically significant. 61% versus 45% difference 16% NOTE, that the intervention schools still had a high transfer rate. • Also effects of intervention waned over the 5 years, suggesting it would need to be continuous to be effective.

Study implications • Study showed an effective intervention. Number of clusters (10) was on small side, ideally should have been more. High chance of missing a smaller effect.

Conclusions • RCT is the BEST evaluative method. • They can, and have been done, in the field of education. • We need MORE larger and better quality trials to inform future policy in this area.

Sample size issues & Trial Quality