1 / 55

Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity

Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity. Sergei Maslov Department of Biosciences Brookhaven National Laboratory, New York. Bacterial genome evolution happens in cooperation with phages. +. =. Variation between E. coli strains.

evan
Download Presentation

Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of BiosciencesBrookhaven National Laboratory, New York

  2. Bacterial genome evolution happens in cooperation with phages + =

  3. Variation between E. coli strains FW Studier, P Daegelen, RE Lenski, S Maslov, JF Kim, JMB (2009) Pan-genome of E. coli Comparison of B vs K-12 strains of E. coli M Touchon et al. PLoS Genetics (2009) Copy and Insert Copy and Replace

  4. Usual suspects are there but do not explain heterogeneity • Negative correlation with protein abundance: 2.5% of variation, P-value=10-5 • Positive correlation with distance from origin of replication: 0.4% of variation, P-value=10-2

  5. High SNP numbers are clustered along the chromosome

  6. Clonal Recombined

  7. P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)

  8. Clonal regions Recombined regions SNPs by recombination/SNPs by clonal mutations r/μ=6±1 Recombined regions P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)

  9. Strains: K-12 vsETEC-H10407 HS O157-H7-Sakai P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013) Neutral model: Mutations and Recombinations among 70 “genes”, population of 104 C. Fraser et al.(2007) and (2009)

  10. Phase transition Δc=1.5% P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)

  11. P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)

  12. Why exponential tail? • Time to coalescence: Prob(t)= 1/Ne (1-1/Ne)t-1=exp(  exponential slope =1/2μNe or 1/θ • Population size Ne=1±0.1 x 109consistent with earlier estimates

  13. Why Ne<< N ? • Phages: • But: there are phages that cross species boundaries. • Also slope is similar for different species • Restriction modification system: • Recombined segments are not continuous[Milkman R, Bridges MM. Genetics 1990] • Recombination efficiency: • Need 20-30 identical bases to start recombination • Our slope predicts 60 bases which roughly matches30 in the neginnng and 30 in the end • Species are defined by recombination

  14. Are our 30+ strains a representative sample? • Fully sequenced genomes: • 1000s of genes (unbiased and complete) • 10s of strains (biased) • MLST data: • 10s of genes (biased) • 1000s of strains (unbiased, I hope) • Databasehttp://mlst.ucc.ie • ∼3000 E. coli strains • 7 short regions of ~500 base pairs eachin housekeeping genes

  15. MLST • -- Genomes

  16. Is it really phages? 1kb: gene length K-12 to B comparison Phage capacity: 20kbOther strains up to 40kb

  17. Does neutral model explain everything? • At 3 standard deviations • 19 1kb regions supervariable • 29 1kb regionssuperconserved

  18. Collaborators& funding • Bill Studier (BNL) • Purushottam Dixit (BNL) • Tin Yau Pang (Stony Brook) • Rich Lenski (Michigan State) • Patrick Daegelen (France) • JinhyunKim (Korea) • DOE Systems Biology Knoledgebase (KBase) • Adam Arkin (Berkley) • Rick Stevens (Argonne) • Bob Cottingham (Oak Ridge) • Mark Gerstein (Yale) • Doreen Ware (Cold Spring Harbor) • Mike Schatz (Cold Spring Harbor) • Dave Weston (ORNL) • 60+ other collaborators

  19. Thank you!

  20. ~ Genes encoded in bacterial genomes Packages installed on Linux computers

  21. Complex systems have many components • Genes (Bacteria) • Software packages (Linux OS) • Components do not work alone: they need to be assembled to work • In individual systems only a subset of components is used • Genome (Bacteria) – bag of genes • Computer (Linux OS) – installed packages • Components have vastly differentfrequencies of use

  22. IKEA: has many components Justin Pollard, http://www.designboom.com

  23. They need to be assembled to work Justin Pollard, http://www.designboom.com

  24. Different frequencies of use vs Common Rare

  25. What determines the frequency of use? • Popularity: AKA preferential attachment • Frequency ~ self-amplifying popularity • Relevant for social systems: WWW links, facebook friendships, scientific citations • Functional role: • Frequency ~ breadth or importance of the functional role • Relevant for biological and technologicalsystems where selection adjusts undeserved popularity

  26. Empirical data on component frequencies • Bacterial genomes (eggnog.embl.de): • 500 sequenced prokaryotic genomes • 44,000 Orthologous Gene families • Linux packages (popcon.ubuntu.com): • 200,000 Linux packages installed on • 2,000,000 individual computers • Binary tables: component is either present or not in a given system

  27. Frequency distributions Cloud Shell Core ORFans P(f)~ f-1.5 except the top √N “universal” components with f~1

  28. How to quantify functional importance? • Components do not work alone • Breadth/Importance ~ Component is needed for proper functioning of other components • Dependency network • A  B means A depends on B for its function • Formalized for Linux software packages • For metabolic enzymes given by upstream-downstream positions in pathways • Frequency ~ dependency degree, Kdep • Kdep= thetotal number of components that directly or indirectly depend on the selected one

  29. Frequency is positively correlated with functional importance Correlation coefficient ~0.4 for both Linux and genes Could be improved by using weighted dependency degree

  30. Tree-like metabolic network TCA cycle Kdep=15 Kdep=5

  31. Dependency degree distribution on a critical branching tree • P(K)~K-1.5for a critical branching tree • Paradox: Kmax-0.5 ~ 1/N  Kmax=N2>N • Answer: parent tree size imposes a cutoff:there will be √N “core” nodes with Kmax=N • present in almost all systems (ribosomal genes or core metabolic enzymes) • Need a new model: in a tree D=1, while in real systems D~2>1

  32. Dependency network evolution • New components added gradually over time • New component depends on D existing components selected randomly • Kdep(t) ~(t/N)-D • P(Kdep(t)>K)=P(t/N<K-1/D)=K-1/D • P(Kdep)=Kdep-(1+1/D) =Kdep-1.5for D=2 • Nuniversal=N(D-1)/D=N0.5 forD=2

  33. Kdep decreases layer number Linux Model with D=2

  34. Zipf plot for Kdep distributions Metabolic enzymes vs Model Linux vs Model

  35. Frequency distributions Cloud Core Shell ORFans P(f)~ f-1.5 except the top √N “universal” components with f~1

  36. Why should we care about P(f)?

  37. Metagenomes and pan-genomes For P(f) ~ f -1.5: (Pan-genome size)~ ~(# of samples)0.5 The Human MicrobiomeProject Consortium, Nature (2012)

  38. Pan-genome of E. coli strains M Touchon et al. PLoS Genetics (2009)

  39. Genome evolution in E. coliStudier FW, Daegelen P, Lenski RE, Maslov S, Kim JF J. Mol Biol. (2009)P. Dixit, T. Y. Pang, Studier FW, Maslov S, submitted (2013)

  40. S. Maslov, TY Pang, K. Sneppen, S. Krishna, PNAS (2009) TY Pang, S. Maslov, PLoS Comp Bio (2011) How many transcription factorsdoes an organism need? Regulator genes Worker genes

  41. Figure adapted from S. Maslov, TY Pang, K. Sneppen, S. Krishna, PNAS (2009) NR~ NG2 NR/NG ~ NG +

  42. Cyril Northcote Parkinson (1909 -1993) “… bureaucracy grew by 5-7% per year "irrespective of any variation in the amount of work (if any) to be done." Why? "An official wants to multiply subordinates, not rivals" "Officials make work for each other.“ so that “Work expands so as to fill the time available for its completion” Is this what happens in bacterial genomes? Probably not!

  43. Economies of scale in bacterial evolution • NR=NG2/80,000  NG/NR=80,000/NG • Economies of scale: as genome gets larger: new pathways get shorter

  44. nutrient Horizontal gene transfer:entire pathways could be added in one step nutrient Redundant enzymes are removed Central metabolic core  anabolic pathways  biomass production

  45. Minimal metabolic pathwaysfrom reactions in KEGG database NR NG Adapted from “scope-expansion” algorithm by R. Heinrich et al. (# of pathways or their regulators) ~(# of enzymes )2

More Related