1 / 0

Diagnostic Misclassification Bias in Spatial Point Data Analysis – A Simulation Study Olaf Berke , Bimal Chhetri an

GeoVet 2013 - London Royal Veterinary College, University of London, UK 21 st - 23 rd August, 2013 _____________________________________________________________________________. Diagnostic Misclassification Bias in Spatial Point Data Analysis – A Simulation Study

libra
Download Presentation

Diagnostic Misclassification Bias in Spatial Point Data Analysis – A Simulation Study Olaf Berke , Bimal Chhetri an

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GeoVet 2013 - London Royal Veterinary College, University of London, UK 21st - 23rdAugust, 2013 _____________________________________________________________________________ Diagnostic Misclassification Bias in Spatial Point Data Analysis – A Simulation Study Olaf Berke, BimalChhetri and ZvonimirPoljak Department of Population Medicine Department of Mathematics and Statistics University of Guelph OMAFRA grant #27058
  2. Overview Motivation: Emergency preparedness and diagnostic misclassification Influenza (H3N2) in Ontario Study Population and Area: Southern Ontario’s swine industry Scenarios: Monte Carlo simulation with varying SE and SP Statistical Methods and Results: Cuzick-Edwards test and logistic regression Discussion and Conclusion 2
  3. 1. Motivation Emergency Management Information Statistical analysis DATA What, if the DATA are unreliable? 3
  4. 1. Motivation Problem: Diagnostictests are imperfect, i.e. not established / calibrated Sensitivity and / or specificity < 100% Diagnostic misclassification occurs Questions: How does misclassification bias affect disease clustering – Cuzick-Edwards test geographic correlations – logistic regression Goal: Study misclassification bias on spatial statistics for point data analysis using the scenario of an emerging influenza virus in Ontario‘s swine industry 4
  5. 2. Influenza (H3N2) in Ontario In 2003 an emerging influenza virus (H3N2) was detected among swine producing farms in Ontario Herd level testing: 15 pigs/farm Estimated HSE / HSP ≈ 95% / 50% with cut-off of ≥ 3 ELISA positive pigs Farm level prevalence ≈ 10% => Assume a similar scenarios for simulation study Poljak et al. (2007) Spatial clustering of swine influenza in Ontario on the basis of herd-level disease status with different misclassification errors. Preventive Veterinary Medicine.81: 236-249. 5
  6. 3. Study Population and Area 2221 farm locations simulated (csr) => 551 farms Reflecting farm distribution at CSD level in southern ON + biosecurity (high vs. low, p = 50%) + farm size (206 – 3844, median = 1058 pigs) + status (p ~ 10% = 56 cases, clustering, 1 cluster) 6
  7. 4. Scenarios Compare 9 scenarios from combinations of HSE=(75%, 85%, 95%) and HSP=(80%, 90%, 95%) for 551 farms with 56 true cases (p=10%): Scenario HSE HSP #false.n #false.pobs.p 1 75% 80% 14 99 .26 2 75% 90% 14 49 .17 3 75% 95% 14 29 .13 4 85% 80% 8 99 .27 5 85% 90% 8 49 .18 6 85% 95% 8 29 .14 7 95% 80% 3 99 .28 8 95% 90% 3 49 .19 9 95% 95% 3 29 .15 . Note: all scenarios overestimate the true prevalence 7
  8. 5. Statistical Methods and Results Cuzick-Edwards test (1990) Test for clustering among case-control locations Idea:count among k-nearest neighbours how many cases there are. T is large when cases cluster: k is assumed to be known special case of Tango’s Maximized Excess Events Test Tango’s MEET allows to find best k via MC simulations which are adjusted for multiple testing For simulated “true” data: k=6 and p-value=0.01 Cuzick, J., Edwards, R. (1990) Spatial clustering for inhomogeneous populations. JRSS B 52(1): 73-104.. Tango T: (2000) A test for spatial disease clustering adjusted for multiple testing. Statisin Med., 19:191-20 8
  9. 5. Statistical Methods and Results a) Identify k correctly? b) Identify clustering? => p-values In all 9 scenarios the modus of k= 4 < 6 (nMC=250) => ? Expect k to vary with p 9
  10. 5. Statistical Methods and Results a) Identify k correctly? b) Identify clustering under misclassification? => p-values Table: % correct test decisions (i.e. identification of clustering) and median of p-values 10
  11. 5. Statistical Methods and Results bios = 0 – low biosecurity, 1 – high biosecurity farms =206 - 3844 pigs on farm GLM or spatial GLMM? Parameter estimates are similar generally SE’s differ (overdispersion problem) Spatial GLMM via PQL is iterative (convergence + time intensive) Cuzick-Edwards does not necessary detect clustering => Logistic regression model fit by maximum likelihood estimation log(π(x)) = β0 + β1bios + β2farms Estimate OR Std. Error Pr(>|z|) (Intercept) -7.156227 .0008 .734411 < .01 bios -1.072367 .3428 .419901 .0107 farms 0.003274 1.0003 .000377 < .01 => OR(high biosecurity) = 0.34, OR(500 more pigs) = 5.1 11
  12. 5. Statistical Methods and Results Logistic regression model fit by MLE (nMC = 1000): 12
  13. 6. Conclusion Diagnostic tests are imperfect especially in emergency situations Spatial point pattern analysis can be heavily biased disease clustering (Cuzick-Edwards test) Clustering might be overlooked type I error rate is a complex function of SE, SP (n, p, ppp) risk factor identification (logistic regression) Bias towards OR = 1 Recommendation: Always consider diagnostic misclassification bias to be present Outlook: More scenarios: locations (ppp), sample size (n) and prevalence (p) More statistics: spatial scan test, D-function 13
  14. 5. Statistical Methods and Results Scenario 7: Hse = 95%, HSp = 80% (nMC = 1000) Bias(OR) = OR – med(OR) Bias(ORbios) = .34 - .79 = -.45 Bias(500*ORfarms) = 5.1- 1.6= 3.5 14
  15. 3. Study Population and Area Southern Ontario 1/3 of Canadian population ~ 13m people 1/3 of all Canadian swine farms ~ 2,200 swine farms 39 census subdivisions Statistics Canada 2006 farm census (http://statcan.gc.ca/pub/95-629-x/1/4123801-eng.htm#35) . 15
  16. 1. Motivation Geographic Epidemiology ... study of spatial patterns (cluster, clustering, trend) ...is often used with emerging diseases to generate hypotheses about disease aetiology ...assume correct data (diagnoses, location, exposure) Emerging infectious disease previously unknown / new agents known agents of increasingincidence known agents occuring in deemed unsusceptible species known agents occuring in new geographic areas / populations Brown (2004) Emerging zoonoses and pathogens of public health…Rev. Sci. Tech. Off. Int. Epiz. 23:435-442. 16
More Related