600 likes | 757 Views
Do the Claims for Spending Billions on Crime Reduction Initiatives Stand up. Radical Statistics Conference 2006 Paul Marchant p.marchant@leedsmet.ac.uk. Aims. A look at knowing ’What Works’. Things worth encouraging: ‘Good Statistics’ (as many arguments are statistical in nature)
E N D
Do the Claims for Spending Billions on Crime Reduction Initiatives Stand up Radical Statistics Conference 2006 Paul Marchant p.marchant@leedsmet.ac.uk
Aims • A look at knowing ’What Works’. Things worth encouraging: • ‘Good Statistics’ (as many arguments are statistical in nature) • Transparency in design and reporting • ‘Investigative Statistics’ • Scientific scepticism • Research which is sufficiently sound in order to properly justify spending on major programmes
Science • Acquiring knowledge using data from observation and experiment. • An inherently uncertain matter……Statistics! • Not all results given, are the same. Therefore there is the need to synthesise. • Science is a public matter: not just because of the impact of the products of science, but also because of the need to check work. • Need open access, 'to pretty much everything', so that the work can be replicated and checked (data, methods, clear complete reports). (Need protocols to be published in advance.)
A scientific answer.… • is never something like x = 1.23 • nor is it just x = 1.23 ± 0.45 • but rather also how it was derived, what assumptions and approximations are involved, so that outsiders can scrutinize. • Just because assumptions are not mentioned does not mean they are not being made!
Some quotations “The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a body of data.” John Tukey “While every data set contains noise, some data sets may contain signals. Therefore before you can detect a signal within any given data set you must first filter out the noise.” Donald J. Wheeler in Understanding Variation: the key to managing chaos. Pub SPC
Time Variation in Crime • It appears that little is known about how crime varies on the small scale. Therefore it is difficult to be clear if any changes are due to a crime reduction intervention.
The Randomised Controlled Trial(A truly marvellous scientific invention) Population Note to avoid ‘bias’: • Allocation is best made tamper-proof. (e.g. use ‘concealment’) • Use multiple blinding of: • patients, • physicians, • assessors, • analysts … Take Sample Randomise to 2 groups Old Treatment New Treatment Compare outcomes (averages) recognising that these are sample results and subject to sampling variation when applying back to the population
Counts of those cured and not cured under the two treatments
The Odds Ratio OR = Ratio of the odds of cure under the two treatments = Odds of cure under new treatment = a/b Odds of cure under old treatment c/d = ad bc If OR > 1 the new treatment is better OR < 1 the old treatment is better
But … … there is sampling variability. Consider a table: With OR = 6045 = 1.23 4055 is not good evidence for a difference in treatment effectiveness. The numbers are small and the sample OR = 1.23 could be due to chance, when in fact the population OR=1.0
Sampling Variation Sampling Variation is given by the (asymptotic) standard error of ln(OR) (S.E (ln(OR)) )2= Var(ln(OR)) = 1 + 1 + 1 + 1 a b c d If the events are statistically independent. The sample ln(OR) is distributed about the population ln(OR) in a ‘Normal’/Gaussian fashion with its standard deviation given by that calculated from the s.e. (5% resides outside 1.96 s.e.)
Crime counts before and afterin the two areas Examine the Cross Product Ratio (CPR) = a/b c/d If it is convincingly >1 then lighting works against crime.
Lighting and crime Much justification for exterior lighting is made on the basis of crime reduction. (e.g. The Institution of Lighting Engineers) There seem to be many ‘theoretical suggestions’ why lighting might increase or decrease crime.
Light has a good press! • ‘…God said, let there be light and there was light. And God saw the light, that it was good….’ Genesis Chapter 1 Verses3/4. • It seems ‘wicked’ to question the benefit of lighting. • However there is a ‘dark side’; lighting’s environmental impacts, possible health impacts.
My ‘Interest’ • “…Paul Marchant, statistician at Leeds Metropolitan University who argues that statistics used in the Home Office Study 251 could equally be used to show that street lighting actually increases levels of crime. This is an argument which the APPLG, alongside the ILE, would hope to show as utterly absurd. Of course it is worth noting that Paul Marchant is also an astronomer as well as being a statistician, and that this may lead to some bias in his interpretation of the statistics he refers to.” P56 of the March/April 2004 issue of the Lighting Journal, the magazine of the Institution of Lighting Engineers. APPLG= The All-Party Parliamentary Lighting Group ILE= The Institution of Lighting Engineers
Forest Plot from Meta-analysis Odds Ratio (95% CI) HORS251 Study STATA metan % Weight
The Confidence Intervals The confidence intervals of individual studies, which go to give the combined result, are calculated as though the events come from independent random samples. So that: Var (ln(CPR)) = 1 + 1 + 1 + 1 a b c d But this can not be!
Correlations Within • Crime is committed by criminals! • Look at the data! Most studies are just counts of crime in the 2 areas, before and after. But one that is not is Bristol. • If the independent random samples assumption were correct, the variance of the count would be expected to be approximately equal to the mean count. But it is not. It is an order of magnitude higher.
The Bristol Study (Shaftoe 1994) Shaftoe said ‘no discernable lighting benefit’ but HORS251 says z=6.6 ! Note: had the data for the year immediately prior to the introduction of the relighting, i.e. periods 2 and 3, been used rather than unnaturally using periods 1 and 2 which leaves a gap of ½ year, the effect found would have been half of that claimed. (Shows large variability.)
Overdispersion (1) The Bristol (and Birmingham) studies show large variation over time when the light level is constant. The variance is many times the mean; ’Overdispersion’. (Thelarge ≈ 60 heterogeneity statistic, Q, given by the meta-analysis of the 13 studies, also suggests this. A large Q shows that there is an inconsistency between, within study variation and between study variation)
Overdispersion (2) The problem for HORS251 is that the confidence intervals around the effect size must therefore be substantially increased. Also because the underlying overdispersion is not properly known for individual studies, we can not say what weights we must use, as these will not be the same as used in the original incorrect HORS251 meta-analysis.
Examine Overdispersion in Comparison Areas Calculated from the before and after counts in the comparison areas Dobsare extremely variable and right skewed. The arithmetic mean is 15 for these comparison areas. (Larger still if the mean includes weighting by number of crimes.)
What observed overdispersions are expected if repeated samples taken? It can be shown: for a Normal(μ, σ2) such that σ2 = κμ, (i.e. the variance = the overdispersion factor (k) the mean (μ) ) that the sampling distribution of the observed overdispersion kobs (= s2/x ) is (provided k<<μ) approximately Chi-squared 1df, scaled by k. This can be written equivalently as Gamma (scale =2k, shape=1/2) This is a right skewed distribution with arithmetic mean = k. [If the before-after correlation = ρ, then k is replaced by k(1-ρ). This is the effective overdispersion, relevant for a before-after study, i.e. the quantity which is of interest. Indeed s2 estimates σ2(1-ρ).]
Other crime data for confirmation I used a data set of burglary count data from 124 anonymised small areas. The data was from a project described in Tilley et al. 1999. Has counts of similar size to those in HORS251.
Burglary data from 124 areas k=10 both
What to conclude about overdispersion Both the HORS251 and the burglary data show great overdispersion of 10 or more. In the case of HORS251 and the CI is (7.9, 38.7). (Similar results are obtained with using the quasi-Poisson Generalised Linear Model in R) A big problem in HORS251 is essentially confusion about ‘Unit of Analysis’. (It is Area, not Crime-event)
The Dudley Study (1) • Used a household crime survey. Painter and Farrington 1997. in Situational Crime Prevention: Successful case studies.(Also there’s the Stoke on Trent Study) • Question “Did you experience crime in the past year if so how many?” • Two areas supposedly matched; one has lighting increased, the other stays the same. The household survey was carried out before and after the new lighting introduced. Households were planned to be linked before and after, but did not happen (nor in Stoke)! • The reports of the studies make great claims of success of the effectiveness of lighting. Claimed by the Institution of Lighting Engineers as 'Proof' that lighting is effective against crime.
The Dudley Study (2):Some Problems uncovered • Eventually I was given some of the data. (With a limited number of background variables however.) • Markedly different crime rates at the start between the 2 areas. • One tailed testing used to claim a statistically significant effect. • Overdispersion: Variance of the number of crimes per household is much larger than the mean, therefore Poisson methods are inappropriate.
The Poisson Model is Inappropriate.(See below, the distribution of crime counts for households.)
The Dudley Study (3):Some problems uncovered cont. • Differential loss to follow up. • Old people are much less prone to experience crime and their number is much reduced due to loss to follow-up in the comparison area. So the relative composition changes during the experiment. • Results are very sensitive to the loss or addition of just one person • But importantly there is correlation between households, giving extra overdispersion (variability). • Essentially it’s a non-randomised two-cluster trial.
Spatial Correlation (1) An expression can be derived for the variance of ln(CPR) for a household survey, before and after, intervention-comparison study, i.e. of the Dudley type. This includes, in addition to the variability between households, both: • correlations within households between times. • correlations betweenhouseholds at any one time.
Spatial Correlation (2) What you get basically is the expression you would get if you ignored the correlation between households at one time, i.e. ignored the spatial correlation, multiplied by the ‘Design Effect’, Deff. (Just as in clustered surveys / trials) Deff=(1+(n-1) ρs) ρs = the spatial correlation n = the number in a cluster, i.e. area
Spatial Correlation (3) • The spatial correlation was not taken into account in the Dudley and Stoke analyses thus ignoring the fact that neighbours ‘share risk’. • An expression for the variance of the logarithm of the Cross Product Ratio CPR is:
The response given to my pointing out that overdispersion exists (1) The expression for the household survey type study, that I give above, but without Deff was used on the original Dudley Poisson result to give a variance adjustment of only about 3 (i.e. just λ estimate). This overdispersion adjustment was then applied in the meta-analysis for all 13 studies. See Addendum to HORS251 (added in Sept. 2003), with which I most profoundly disagree, ….even though my name is mentioned!
The response given to my pointing out that overdispersion exists (2) Additionally, Farrington and Welsh (2004), following my own short article in the BJC, cite the geometric mean of the s2/x of the 13 studies of HORS251 to justify a small value. However, as I have indicated here, it is the arithmetic mean which is appropriate, showing that the overdispersion is much bigger, with a value of something like 15.
The response given to my pointing out that overdispersion exists (3) • Farrington and Welsh justify their overdispersion estimate because if one divides the original heterogeneity statistic, Q=60, by their favoured estimate an ‘acceptable’ heterogeneity statistic results! Note it is usual to use Q to uncover anomalies in the data rather than remove them! The larger value of overdispersion, 15, would indicate excessive homogeneity Q revised = 60/15 as might result from publication bias. (There is no register for study protocols, which would guard against publication bias).
Lack of Equivalence between Areas Invariably it is the most crime-ridden area that gets the lighting, whereas the relatively crime-free ‘control ’ is not re-lit. So there is lack of equivalence at the start. One effect of this is to allow ‘regression towards the mean’ to operate. (see later) The name ‘Control Area’ is a misnomer. ‘Comparison Area’ is a better name.
100 Line of Equality Line of mean of Y for a given X Cloud of Data Points 50 Y The after measurement 0 0 50 100 X The before measurement
The response given to the lack of equivalence between the 2 areas. (RTM) • ‘Regression towards the mean’ (RTM) has not been acknowledged to be a problem, after I pointed it out. • The burglary data shows RTM nicely. Splitting the data for the 124 areas into 2, above or below the mean burglary rate in the first year, exhibits a tendency in the following year for the high burglary rate group to show a fall and the low burglary rate group to show a rise.
RTM Example from the Burglary Data • (Seen in period 2 to 3 also. And using rate, rather than count)
Regression Towards the Mean RTM: Seeing effects which aren’t there A statistical novice might interpret the fact that the high burglary rate group shows a reduction in burglaries (-71), as opposed to the low rate group (+6), as evidence of ‘something important going on’ rather than just what is expected when you have correlated data. RTM follows from correlation. (As Francis Galton discovered more than a century ago, in the 1880s.)
The response given to the lack of equivalence between the 2 areas. (RTM) • Farrington and Welsh (2006) claim that RTM is a not problem because the effect in crimes counted in 250 Police ‘Basic Command Units’ going from 2002/3 to 2003/4 showed only small effect. This is hardly surprising as the areas and hence the number of crimes counted are an order of magnitude larger than in HORS251 so the year to year correlation is high. Note Wrigley (1995) “This tendency for correlation coefficients to increase in magnitude as the size of the areal unit involved increases has been known since the work of Gehlke and Biehl (1934)”.
Bristol study revisited A lighting benefit effect with p=0.011 is claimed in reply to me. But this depends on an arbitrary, specific regression model that requires the variance to be the same in both areas and to include a linear time trend, identical in both areas, but which is not ‘statistically significant’. On the other hand, a model which just uses the crime count in the comparison area as predictor of crime in the re-lit area (in the spirit of HORS251) shows no stat. sig. effect. Does the data really look as if there is such a clear effect, i.e. one which would only occur 1 time in 100, when there is in fact no lighting benefit?
The Bristol data again It seems to me that this hardly presents clear evidence for lighting benefit! There’s a big problem of model uncertainty.
Cost benefit analyses • Cost benefit analysis has been done based on very few studies by lighting and crime researchers (and gives a highly favourable result for lighting). However doing this only compounds the problem. As an unknown, unproven benefit/harm is being compared with uncertain costs. • We need to get much better information to do such an exercise properly otherwise it tends to look ‘scientific’ to the eye of a novice, when in fact it isn’t, because of flimsy data and method.
‘Researchvertising’ • Unsurprisingly HORS251 and the Dudley Study are used by the lighting industry to promote its wares. Also ‘responsibilities under the crime and disorder act’ are invoked
My take on lighting and crime • It may be that lighting reduces crime, or may be it increases crime. We have to look at the evidence as given. The conclusion, at present, is: We do not know....yet we ought to know! • Note, I know of no scientific trials of exterior 'Security' lighting. So no one knows if this works. • We ought to take a ‘Popperian’ view and entertain the possibility of light being ineffective or worse, against crime. • Of course we all need light at night, to see by. (Those concerned about light pollution are basically talking ‘lamp-shades’). However there is no sound evidence we need light to protect us from crime, in spite of claims.
Car Alarms • It seems that there is little evidence that car alarms prevent cars being broken into etc. • But they do disturb people’s sleep! • Attempt in New York to get them banned, (rely on passive methods of risk reduction instead.)
Wider problems of inappropriate methods • The costs of crime and attempts at its reduction are large. • Similar problems probably exist for the evaluation of other area-based crime reduction interventions, too. (They certainly do for HORS252 on CCTV where the same methods as HORS251 are used on 18 studies, Q=270. However no effect of CCTV is claimed). Problems seem to be encouraged by the ‘Maryland Scientific Methods Scale’ which seems to suggest that weaker designs, than RCTs, might suffice. • We do need to have proper evidence to decide ‘what works’ in crime and in all spheres .