170 likes | 254 Views
Biosurveillance of emerging biothreats using scalable genotype clustering. In collaboration with: Vitali Sintchenko ( ICPMR-Westmead Hospital & CHI-UNSW ) Qinning Wang ( ICPMR-Westmead Hospital ) Lester Hiley ( Queensland Health and Pathology Services )
E N D
Biosurveillance of emerging biothreats using scalable genotype clustering In collaboration with: Vitali Sintchenko (ICPMR-Westmead Hospital & CHI-UNSW) Qinning Wang (ICPMR-Westmead Hospital) Lester Hiley (Queensland Health and Pathology Services) Gwendolyn Gilbert (ICPRM-Westmead Hospital) Enrico Coiera (CHI-UNSW) Blanca Gallego Luxan(CHI-UNSW) CHI Seminar Series, 1-May-2008
Laboratories Hospitals Clinicians Nurse call-lines Pharmacists Public Health Agency Schools Infectious Disease Surveillance: Systems On-going collection and monitoring of infection-specific/related data with the aim of detecting (and subsequently control or prevent) outbreaks Epidemiologic Investigation and Intervention Data Collection and Preprocessing Analysis and Interpretation Relevance: Bio-terrorism, New/Emerging pathogens, Prevent/Reduce morbidity/mortality/costs
Data Lab-test results Clinicians reports Hospital admission ED chief complaints Ambulance log-sheets Drug prescriptions filled Nurse call-line reports Over-the-counter drug sales School/work absentees/reports Methods Manual analysis and interpretation Automated statistical analysis and alarm signaling CUSUM EWMA Likelihood Scan Statistic …. Infectious Disease Surveillance: Data and Methods Rapid pathogen genotyping techniques More specific, Less noisy, Less timely, Less counts (can detect moderate and small outbreaks, can prompt well defined public health action) Automated analysis of pathogen genotyping signals Less specific, Noisy, More timely, More counts (can detect large outbreaks faster, public health action is unclear) [Buckeridge et al 2005, Sonesson and Bock 2003]
Molecular Fingerprinting of Pathogens Consists of subtyping or “fingerprinting” isolates obtained from infected patients for the purpose of pathogen clonal discrimination [Gilbert 2002, Fawley and Wilcox 2005, Sintchenko et al 2007] There are other molecular techniques used in the diagnosis and management of infectious diseases. Other useful information (apart from clonal lineage): virulence, antibiotic resistance, patient outcomes
Examples of datasets ED – NSW - Pneumonia Salmonella – NSW&QLD by genotype ED – NSW – Whopping Cough
What constitutes an outbreak? Transmission (contact, vehicle, vector) • Spatio-temporal correlations • Pathogen genotypes Epidemic/Unusual Nature (increased/unexpected incident rate) • Statistical deviation from a baseline rate • Minimum/unexpected number of incidents on a given time/space Outbreak • Dependent on: • Severity, communicability, local epidemiology of the disease • Public health resources regarding investigative methods and options for effective prevention and control Public Health Intervention (reassurance/type of action)
Our Outbreak Definitions We define a genotyping cluster as a maximal set of at least N isolates that share the same [or closely related] genotype, among a set of isolates from infected patients, each with an associated date and location (e.g. collection date and patient’s address). To account for clustering in space and time we specify: Temporal cluster: A genotyping cluster, for which the time difference between any two consecutive cases is at most t days. The limit of t=0 corresponds to clusters that last one day. Spatial cluster: A genotyping cluster, for which locations of all cases are connectable by a spanning tree (a graph connecting a set of nodes [i.e. case locations] without any cycles) with all edges no more than dkilometers long. The limit of d=0 indicates a cluster occurring in one location. Spatio-temporal cluster: A combined temporal and spatial cluster characterized by parameters t and d.
Advantages of Our Outbreak Definitions • Compared to no-genotyping data: • Genotype is a good biomarker for transmission Within a set of same-genotype isolates: • Compared to “manual” or “ad-hoc” definitions (e.g. at least 5 cases found within a 4 week period): These definitions are ambiguous, have a constrained temporal/spatial boundary, and are generally inappropriate for prospective surveillance. • Compared to automated statistical methods: These methods may not lead to statistically significant clusters. • Our outbreak definitions provide an unique way of clustering a set of cases that is independent of the order in which the cases are considered and has no prescribed temporal or spatial boundary. • Unambiguous, easily scalable, the same prospectively than retrospectively, and can be adjusted to reflect changes in local disease prevalence and availability of public health resources
Example: Surveillance of human Salmonella Typhimurium outbreaks using MLVA clustering 816 isolates collected from NSW and QLD between 23/10/2006 and 31/03/2007, displaying 226 different MLVA patterns Multi-Locus Variable-number tandem repeat Analysis The method is based on capillary separation of multiplexed PCR products from five VNTR loci in the S. Typhimurium genome labeled with multiple fluorescent dyes. The different alleles at each locus are then assigned allele numbers. Locus Repeat Smallest Largest No. of % of missing Polymorphism (bp) product (bp) product (bp) alleles amplicons index STTR3 33 464 530 3 5 0.25 STTR5 6 228 300 13 0 0.87 STTR6 6 283 397 19 16 0.90 STTR9 9 163 181 3 0 0.51 STTR10pl 6 358 496 20 48 0.92 MLVA loci characteristicsFrom: Lindstedt et al.; J. Microbiological Methods (2004); Table 2
Salmonella Genotype Clusters 36 genotyping clusters (N=5); Mean cluster size = 15 cases; Mean cluster duration = 96 days; Mean cluster area = 6,839km2 (0.27%NSW+QLD)
Salmonella Temporal and Spatio-Temporal Genotype Clusters Cluster Genotype Size Duration Area PhageConfirmed Confirmed (counts) (days) (km2) type ScanStat Epi.Invest. temporal clusters (N=5, t=1) 1 01-03-20-04-06 39 5 184.19 9, 12 Yes Yes 2 02-06-20-14-02 24 10 735.91 197 Yes Yes 3 01-04-13-05-08 16 3 626.14 170 Yes Yes 4 02-06-20-14-02 10 5 165.74 197 5 01-02-04-01-03 8 3 795.69 U302, 186, 35 6 02-05-20-14-02 8 4 81.64 197 Yes 7 02-06-20-14-02 7 4 228.30 197 8 01-05-17-03-08 7 5 86.33 135a Yes 9 01-02-04-01-03 6 3 183.31 UNK, 35, U302, 29, RDNC 10 01-05-17-03-08 6 4 184.70 135a 11 01-01-19-14-03 5 2 2,792.20 RDNC, 44 12 01-04-05-13-03 5 3 104.31 135a, RDNC Yes Spatio-temporal clusters (N=5, t=1, d=5) 1 01-03-20-04-06 29 5 98.62 9 Yes Yes 2 01-04-13-05-08 14 3 314.64 170 Yes Yes 3 02-06-20-14-02 7 5 37.81 197 Yes Yes
Example of a Salmonella Genotyping Spatio-Temporal Cluster Homebush area (Western Sydney), end of March 2007, contaminated pork in Chinese bakery
Choosing Cluster Parameters Cluster parameters (N, t and d) are adjusted to accommodate for changes in local disease prevalence and availability of public health resources
Prospective Surveillance of Genotyping Clusters Timely outbreak detection is best when N=3,4, or 5 (50% detected within midpoint)
Prospective Surveillance of Genotyping Clusters Clustering in time does not significantly change timeliness of detection (N>2) Clustering in both time and space decreases the overall efficiency of prospective detection
Summary • We have used emerging genotyping techniques to introduce an operational definition of outbreak based on temporal and spatial clustering of genotypes. • This definition provides unambiguous clustering of isolates that can be tuned to accommodate the requirements and resources available for outbreak investigations • It has the potential to enhance early warning systems for public health allowing timely recognition, source identification and capacity to apply specific public health action • Limitations include: Limited coverage (genotyped samples represent a small proportion of infectious cases in the population); Date and Location associated with specimen only approximate epidemiologically relevant parameters.