670 likes | 815 Views
Evidence of Selection on Genomic GC Content in Bacteria. Falk Hildebrand Adam Eyre-Walker. Genomic G+C content. Genomic GC content. Codons. Non-synonymous. 2-fold : TTT TTC 4-fold : CCT CCC CCA CCG. ATA CCC CTA CCT. GCT 123.
E N D
Evidence of Selection on Genomic GC Content in Bacteria Falk Hildebrand Adam Eyre-Walker
Codons Non-synonymous 2-fold : TTT TTC 4-fold : CCT CCC CCA CCG ATA CCC CTA CCT GCT 123 Synonymous
Explanations • Mutation bias • Suoeka (1961) & Freese (1962) • Intrinsic and/or extrinsic • Selection • Many authors • Biased gene conversion • Anonymous referees
Correlates • Genome size • positive correlation • Lifestyle • higher GC in free living • Aerobiosis • higher GC in aerobic • Nitrogen utilization • higher amongst N fixers • Temperature • higher amongst thermophiles?
Evidence of selection I • Escherichia coli • Mutation pattern • 273 GCAT versus 131 ATGC • Predicted GC content = 0.32 • Observed GC content = 0.50 • Observed GC at neutral sites = 0.58 Lynch (2007) Origins of genome architecture
Evidence of selection II • Phylogenetic analyses • Mycobacterium leprae(Lynch 2007) • Escherichia coli (Balbi et al. 2009) • 5 pathogenic bacteria (Hershberg and Petrov 2010)
Phylogenetic analysis G A A G G G
Evidence of selection II • Phylogenetic analyses • Mycobacterium leprae(Lynch 2007) • Escherichia coli (Balbi et al. 2009) • 5 pathogenic bacteria (Hershberg and Petrov 2010) • Excess of GC AT
Test of mutation bias • If GC content is • Due to mutation bias alone • Stationary • And the infinite sites assumption holds • Then • # GCAT mutations = # ATGC mutations
Why? • If GC stationary • #GCAT subs = #ATGC subs • All neutral mutations have same chance of fixation • #GCAT muts = #ATGC muts
Identifying mutations Strain 1 ACT GCT TTG GCT TTA TGG Strain 2 ACT GCT TTG GCT TTA TGA Strain 3 ACT GCT TTG GCT TTA TGG Strain 4 ACT GCT TTCGCT TTA TGA Strain 5 ACC GCT TTC GCT TTA TGG Strain 6 ACT GCT TTG GCT TTA TGG TC CG GA
Orienting mutations Outgroup ACT GCT TTC GCT TTA TGG Strain 1 ACT GCT TTG GCT TTA TGG Strain 2 ACT GCT TTG GCT TTA TGA Strain 3 ACT GCT TTG GCT TTA TGG Strain 4 ACT GCT TTCGCT TTA TGA Strain 5 ACC GCT TTC GCT TTA TGG Strain 6 ACT GCT TTG GCT TTA TGG TC CG GA GCAT = 1 ATGC = 1
Orienting mutations Strain 1 ACT GCT TTG GCT TTA TGG Strain 2 ACT GCT TTG GCT TTA TGA Strain 3 ACT GCT TTG GCT TTA TGG Strain 4 ACT GCT TTCGCT TTA TGA Strain 5 ACC GCT TTC GCT TTA TGG Strain 6 ACT GCT TTG GCT TTA TGG TC GC GA GCAT = 1 ATGC = 1
Test of mutation bias • If GC content is • Due to mutation bias alone • Stationary • And the infinite sites assumption holds • Then • # GCAT = # ATGC
Codons Non-synonymous 2-fold : TTT TTC 4-fold : CCT CCC CCA CCG ATA CCC CTA CCT GCT 123 Synonymous
Data • Popset • Keyword “bacteria” • 8 or more sequences from same species • 149 bacterial species • 8 phyla, 15 classes and 77 genera • 1 or more genes • 10 or more synonymous polymorphisms • 4-fold diversity < 0.1
Overall result P<0.0001
Bias versus GC4 GCAT Z = GCAT
Potential problems • Infinite sites assumption • Sequencing error
Infinite sites assumption • Each mutation occurs at a site which is not polymorphic
Infinite sites assumption • If GC content stationary • #GCAT subs = #ATGC subs • All neutral mutations have same chance of fixation • #GCAT muts = #ATGC muts
Finite sites assumption • If GC content stationary • #GCAT subs = #ATGC subs • All neutral mutations have same chance of fixation • #GCAT muts = #ATGC muts • But some mutations not evident as poly
Finite sites • GC rich sequence • Implies • rate of ATGC > rate of GCAT • Mutation rate low • #ATGC poly = # GCAT poly • Mutation rate high • #ATGC poly < # GCAT poly
Finite sites theory uμ GC AT vμ Assume : stationary popn stationary GC
Finite sites theory 0.95 0.9 0.8 0.7 0.6
Predicting Z • Assume • finite sites • neutrality • Use GC4 to get f • Use observed diversity to estimate μ • Predict Z
Explanations • Non-stationary base composition • Selection for translational efficiency • Biased gene conversion • Selection upon base composition
Explanations • Non-stationary base composition • Selection for translational efficiency • Biased gene conversion • Selection upon base composition
Explanations • Non-stationary base composition • Selection for translational efficiency • Biased gene conversion • Selection upon base composition
Explanations • Non-stationary base composition • Selection for translational efficiency • Biased gene conversion • Selection upon base composition
Biased gene conversion A T A G C G C G C T C G
Four gamete test G A G T C A C T G A G T C A No recombination Recombination