250 likes | 388 Views
Estimating the relative roles of recombination and point mutation to the generation of single locus variants in Campylobacter jejuni and Campylobacter coli Shoukai Yu m EpiLab, Hopkirk Research Institute, Massey University http://mepilab.massey.ac.nz/. Background of my PhD project (1).
E N D
Estimating the relative roles of recombination and point mutation to the generation of single locus variants in Campylobacter jejuni and Campylobacter coliShoukai YumEpiLab, Hopkirk Research Institute, Massey Universityhttp://mepilab.massey.ac.nz/
Background of my PhD project (1) Understanding evolutionary processes in C. jejuni and C. coli Multidisciplinary project: Statistics, genetics, bioinformatics and public health Public health application Foresee the next potential generation Disease Control Strategy
Background of my PhD project (2) We know Campylobacter Zoonotic pathogens Colonize the gut of birds and mammals Causes serious public health problems We do not know How this pathogen evolved Where and when the next virulent strain will appear Opportunity • The most important pathogen in NZ • NZ has some unique strains that may have evolved in NZ
An International Comparison… Source: Olsen et al Campylobacter. 3rd ed. Washington DC: ASM Press; 2008 .
Background of my PhD project (3) Advanced mathematical tools • Unique factors: • The distinctive history of New Zealand environment • The introduction of European wildlife and domestic livestock • Piece together all evidence and trace back the huge experiment
Background of my PhD project (4) Unique factors (cont.): The first-hand results from mEpiLab This is the largest and most comprehensive dataset in the southern hemisphere It contains a five-year study and was carried out in the Manawatu sentinel site Worldwide Database (PubMLST)
Manawatu study 2005-2010... Sentinel site (5 yrs)
Multi-locus sequence typing (MLST) Widely used for the typing of bacterial pathogens Used to type strains based on their nucleotide sequence Seven housekeeping genes Fragments of gene Repeatable across labs worldwide PubMLST (http://pubmlst.org/) Publicly accessible database In common formats Suitable for worldwide comparison of bacterial isolates
One example for sequence type (ST) glnA4 GATCCTTTTA……ACAATGTT 477 nucleotides Sequence type (ST)
Single Locus Variant is defined as a pair of sequence types that differs at exactly one of the seven alleles that make up the MLST profile. For example: gltA5 ……GCTTAAACCTA…… gltA1 ……GCTTAGACCTA…… 315--325 nucleotide out of 402 nucleotides It suggests that more likely that one mutation occurred
aspA104 ……ACAACTTAATGTTTTTGAACCAGTTGCA aspA184 ……GCAGCTTAATGTTTTTGAACCTGTAATT Nucleotides 450--477 out of 477 It suggests that it is more likely to be a recombination occurred. Recombination is also called horizontal gene transfer.
The Questions: What is the distribution of nucleotide differences for SLVs? What is the relative contribution of recombination and mutation to the generation of SLVs?
Q1: Multi-modal distribution? More likely to be mutation More likely to be recombination within Campylobacter species More likely to have involved recombination between Campylobacter species
Q2: What is the relative contribution of recombination and mutation? Method: Estimate how many SLVs due to recombination vs. mutation Model steps Model for mutation only: Calculate the probability of differences due to mutation Model for recombination involved: Calculate the probability of differences due to recombination by using the known pattern from the international database Model to estimate the proportion (p) of SLVs from the mutation only situation (EM algorithm) Estimate the actual probability (x) of an evolutionary event being mutation rather than recombination
Step 1: Mutation Model MLST alleles: three mutations occurred in glnA: Probability of hmutations, given SLV occurred at locus i, M represent only mutation events occurred. is the probability that if a mutation occurs, it occurs at locus i.
Posterior mutation models based on different prior parameters. Probability of n mutations ( =2) Probability of n mutations ( =10)
Step 2: The recombination related modelSelect two alleles randomly for a given locus (say aspA), based on the frequency of these alleles in PubMLST compare them and record the number of differences
Step 3: EM algorithm Used to estimate the proportion of SLVs from the mutation only situation(p) Step 4: Estimate the actual probability (x) of an evolutionary event being mutation rather than recombination Using the relationship between p and x, kis the number of events separating the two branches
Simulation work (1) - constant populations Estimated r/m ratio True r/m ratio
Simulation work (1) - constant populations Estimated r/m ratio True r/m ratio
Conclusion The multi-modal distribution of nucleotide differences in SLVs due to both recombination and mutation The relative contribution of recombination is larger than mutation
Acknowledgement Thanks to Marsden Project for funding Thanks to the excellent supervision team: Prof Nigel French, Dr Barbara Holland, Dr Patrick Biggs, Prof Paul Fearnhead and Dr Grant Hotter Thanks to the mEpiLab for the efforts on the dataset