270 likes | 393 Views
Outbreak of E. coli O104:H4 heralds a new paradigm in responding to disease threats. Nicola J. Holden Leighton Pritchard. EHEC O104:H4 outbreak, Europe 2011. Unprecedented: scale of outbreak (3950 affected, 53 deaths; multiple import restrictions)
E N D
Outbreak of E. coli O104:H4 heralds a new paradigm in responding to disease threats Nicola J. Holden Leighton Pritchard
EHEC O104:H4 outbreak, Europe 2011 Unprecedented: • scale of outbreak(3950 affected, 53 deaths; multipleimport restrictions) • emerging pathogen(one previous case in S.Korea) • rapid production of sequence data • crowd-sourcing of assembly, and annotation viaGitHubhttps://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki
EHEC O104:H4 outbreak, Europe 2011 Unprecedented: • scale of outbreak(3950 affected, 53 deaths; multipleimport restrictions) • emerging pathogen(one previous case in S.Korea) • rapid production of sequence data • crowd-sourcing of assembly and annotation via collaborative revision control site: GitHubhttps://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki
EHEC O104:H4 outbreak – timeline • 1st May: onset of outbreak • 26th May: strain characteristics (Scheutzet al., 2012 Eurosurveill) • 30th May: diagnostic laboratory information released (Muenster) • 2nd June: first draft assembly available (GitHub) • 9th to 21st June: additional sequences announced • 22nd June: Microbiological characteristics published (Bielaszewskaet al., 2011 LID) • 26th July: official end of the outbreak (RKI) refs: https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki; RKI; Institute of Hygiene, Muenster
EHEC O104:H4 outbreak – timeline • 1st May: onset of outbreak • 26th May: strain characteristics (Scheutzet al., 2012 Eurosurveill) • 30th May: diagnostic laboratory information released (Muenster) • 2nd June: first draft assembly available (GitHub) • 9th to 21st June: additional sequences announced • 22nd June: Microbiological characteristics published (Bielaszewskaet al., 2011 LID) • 26th July: official end of the outbreak (RKI) refs: https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki; RKI; Institute of Hygiene, Muenster
EHEC O104:H4 outbreak – timeline • 27th July: Publication of open-source genomic analysis
A changing paradigm? • Kwan et al. (2011) http://precedings.nature.com/documents/6663/version/1
Meanwhile: diagnostics 27th June – 6th July • Outbreak isolate-specific, sub-serotype diagnostics • Exploit rapid sequencing: work directly from incomplete and unordered draft genome sequences • Rapidly generated (perhaps ahead of the biology?) • Validated (good estimates of error rates) • Easy to use and distribute • Cheap(erthan sequencing everything)
Meanwhile: diagnostics 27th June – 6th July • Outbreak isolate-specific, sub-serotype diagnostics • Exploit rapid sequencing: work directly from incomplete and unordered draft genome sequences • Rapidly generated (perhaps ahead of the biology?) • Validated (good estimates of error rates) • Easy to use and distribute • Cheap(erthan sequencing everything) Alignment-free PCR primer design: no need to identify conserved signature sequences prior to primer design
Alignment-free primer design: strategy • ‘Positive’ genome set: 11 genome assemblies of 9 EHEC O104:H4 outbreak isolates (GitHub crowdsourcing) • ‘Negative’ genome set: 31 genomes of E. coli and E. fergusonii (GenBank) • Design many (>1000) primers to positivegenome set:target CDS; optimise for qRT; 20 mers; 100 bpamplicons; TA = 58 oC • Filter primers in silico: • Exclude sets with predicted productive amplification in negativegenomes. • Screen primers to exclude sets with strong sequence similarity to any of a larger set of off-target genomes: (GenBankEnterobacteriaceae)
Alignment-free primer design: strategy • ‘Positive’ genome set: 11 genome assemblies of 9 EHEC O104:H4 outbreak isolates (GitHub crowdsourcing) • ‘Negative’ genome set: 31 genomes of E. coli and E. fergusonii (GenBank) • Design many (>1000) primers to positivegenome set:target CDS; optimise for qRT; 20 mers; 100 bpamplicons; TA = 58 oC • Filter primers in silico: • Exclude sets with predicted productive amplification in negativegenomes. • Screen primers to exclude sets with strong sequence similarity to any of a larger set of off-target genomes: (GenBankEnterobacteriaceae)
Automation https://github.com/widdowquinn/find_differential_primers
Alignment-free primer design 1. Process configurationfiles: Locations and classes of input sequence files. III ... II ... I IV 2. Convert to single (pseudo)chromosomes: Concatenate draft genome sequence. Positive ... V ... Negative 3. Genome feature locations: From GBK file or predicted from Prodigal.
Primer prediction (on positive set) III II 4. Predict primer locations: > 1000 thermodynamically plausible primer sets on each (pseudo)chromosome, using Primer3. I IV Positive V Negative
Test cross-amplification in silico III II 5. Check cross-amplification: All primer sets tested against other organisms, using PrimerSearch. I IV Positive V 6. BLAST screen: All primers screened for off-target sequences with BLAST: 7 possible primer sets Negative
Classify primers and validation ... 7. Classify primers: Classified primer sets according to their ability to amplify specific classes of input sequence. I III IV V +ve -ve ... II ... III ... IV V ... 8. Validate primers: Primer set validated on positive and negative targets in vitro. 5 target sequences: prophage gp20 (2) hypothetical CDS (2) impB (1)
Validation • In silico, diagnostic primers are just another classifier • Validation on unseendata is critical • (avoid overfitting, estimation of performance) • Direct experimental validation of primer candidates (Münster): • ‘Positive’ set = 21 clinical outbreak isolates • ‘Negative’ set = 32 HUSEC / EPEC isolates • Positive control = LB 226692
Primer design: validated in vitro positive negative
Alignment-free primer design: summary • Individual primer sets: 100 % sensitivity; 82–94 % specificity; 9% < FDR < 22% • Combiningprimers: 100 % sensitivity and specificity • A minimal combination of two primer sets discriminated absolutely between outbreak O104:H4 isolates and non-outbreak E. coli isolates, including HUSEC 041 • Flexibility in strategy allows for targeted design, e.g. multiplex PCR / different organisms / large gene families etc.. • Same approach used for • Resolving Dickeya plant pathogens • Discriminating between RxLR effectors in Phytophthorainfestans
Alignment-free primer design: summary • Bypass the need for: • multiple genomic alignments • biological justification for primer choice (maybe even reveal biology…) • Produce diagnostic primers for any subgroup of organisms (possibly…) • Limitations • Scaling issue: PrimerSearchis slow (modular pipeline allows use of alternative programs) • Low specificity of primers -> use qPCR • Very similar organisms may not be distinguished • Time from genomes to primer sets: 90 hours • possibility for improvements as collaborative bioinformatics projects (speed up off-target primer mapping, make into user-friendly tool…)
Acknowledgements Thanks to Nadine Brandt, Kath Wright and Sean Chapman nicola.holden@hutton.ac.uk leighton.pritchard@hutton.ac.uk