1 / 38

Causes of insertion sequences abundance in prokaryotic genomes? A problem of size

Causes of insertion sequences abundance in prokaryotic genomes? A problem of size. Marie Touchon E.P.C Rocha Atelier de BioInformatique, Université Pierre et Marie Curie, Paris Unité Génétique des Génomes Bactériens, Institut Pasteur, Paris mtouchon@pasteur.fr. IS elements :

aspen
Download Presentation

Causes of insertion sequences abundance in prokaryotic genomes? A problem of size

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Causes of insertion sequences abundance in prokaryotic genomes? A problem of size Marie Touchon E.P.C Rocha Atelier de BioInformatique, Université Pierre et Marie Curie, Paris Unité Génétique des Génomes Bactériens, Institut Pasteur, Paris mtouchon@pasteur.fr

  2. IS elements : the simplest form of transposable elements - 700 to 2500 bp - coding only the information allowing their mobility ability to generate mutations : - by insertion within genes - by activate genes on insertion upstream - to generate extensive DNA rearrangements have been found to shuttle the transfer of adaptive traits such as : - antibiotic resistance - virulence - new metabolic capabilities Their exact nature is still debated : Selfish/Advantageous? - genomic parasites - beneficial agents

  3. Causes of insertion sequences abundance in prokaryotic genome ? Reasons largely unknown and widely speculated Hypotheses : - IS family specificity - Genome size - Frequency of horizontal gene transfer - Pathogenicity - Type of ecological associations - Human sedentarisation The current availability of hundreds of genomes renders testable many of these hypotheses.

  4. IS elements Identification : Problem : ISs annotations are heterogeneous, inaccurate or insufficient Solution : Reannotation of ISs using comparative study by adopting the nomenclature defined by Chandler (1998) - ISs have one or two consecutive ORFs encoding transposase protein - ISs are grouped into 21 distinct families

  5. ISs Reannotation (1)ISs CDS Detection All annotated CDS Genome x ISs Database Chandler et al. IS1A-IS1B IS1A-IS21A-IS21B-IS1B IS1A-IS3A-IS3B-IS1A (2)IS elements reconstitution IS1 IS1 IS1 IS21 IS3 (3)ISs complete or partial ISs fragments (> 20% of difference length) ISs with internal insertion Partial elements

  6. ISs Reannotation - Reassessment (1) 262 genomes Shigella flexneri Annotated ISs CDS Decteted ISs CDS Y = 0.77 (0.02) X + 5.86 ( 1.89) R2 = 0.81 (P< 0.0001) R = 0.95 (P< 0.0001) 1194 (11%) 8823 (89%) 2115 (22%) Number of Detected ISs CDS (2) 8123 ISs elements 83% are complete (may be active) Number of Annotated ISs CDS (3) Only 20% (1994) of Genbank ISs had a consistent classification

  7. Distribution of ISs in 262 genomes Sulfolobus solfactaricus (archaebacteria) Bacillus haludorans (firmicute) Nitrobacter winogradskyi ( proteobacteria) Bordetella pertussis ( proteobacteria) Number of Genomes Shigella sonnei ( proteobacteria) The absence of ISs is not anecdotic 24% genomes lack IS 48% genomes [0-10] ISs Number of ISs High variability of the number of ISs / Genome of the number of ISs families / Genome Number of ISs families

  8. Association with phylogenetic inertia Rapid dynamic of gain and loss The number of ISs evolve so fast, that there is no historical correlation

  9. The effect of IS family specificity Firmicute ;  Proteo ;  Proteo 100% Entero 90% Incongruent phylogenetic trees High diversity of ISs found within strains or closely related species

  10. The effect of IS family specificity : Examples Pseudomonas syringae tomato Pseudomonas syringae syringae Pseudomonas syringae pv. phaseolicola 10 IS3 42 IS5 23 IS21 40 IS66 10 IS1111 13 ISNCY 1 IS91 14 IS3 1 IS5 1 IS66 1 IS110 1 IS630 7 IS3 43 IS5 7 IS21 2 IS66 1 IS1111 1 ISNCY 3 IS91 52 IS256 + + = 139 ISs = 18 ISs = 116 ISs This effect is unlikely to explain the variability of ISs

  11. The effect of genome size Wilcoxon test : p<0.0001 Spearman’s r=0.63, p<0.0001 N= 64 198 Strong association between Genome size and IS number (and density) The larger the genome, the more IS elements it contains

  12. Strain Specific region Prophage-Database(Nestle, Casjeans, 2003) HGT-Database(Garcia-Vallve,2003) i j A B Lists of orthologs Strain A B C Strain A specific region The effect of horizontal gene transfer Putative orthologs: Reciprocal best hits, proteins with >90% similarity and <20% length difference. Strain specific region: Exclusive region to a strain which presented at least ten consecutive genes without an orthologs E. Coli O157:H7 Sakai

  13. The effect of horizontal gene transfer Spearman’s r= 0.31 p>0.1 (NS) Wilcoxon test : p<0.0001 t-test : p<0.001 11.4% 5.2% Genomes lacking ISs have fewer HGT ISs are ~ 4 times more concentrated in HGT regions HGT may be a determinant of the presence of ISs, but not of its abundance

  14. The effect of horizontal gene transfer Spearman’s r=0.84, p<0.0001 IS families diversity in HGT regions is almost as high as in the entire genome HGT is a necessary but not sufficient condition to the presence of ISs The intensity of HGT is not a significant determinant of the IS abundance

  15. The effect of pathogenicity Yersinia pestis (plague) Shigella flexneri, sonnei (dysentery) Bordetella pertussis (whooping cough) Wilcoxon test : p<0.001 Wilcoxon test : p>0.5 3.6 4.3 N = 100 153 IS=0 8% 17% 55% 100% No association between the presence of IS and pathogenicity Strong association between the frequency of IS and the facultative character of the ecological associations

  16. The effect of the type of ecological association Stepwise multiple regression We removed genomes lacking IS (possibly under sexual isolation) Covariate Cumulative R2 Kruskal-Wallis test : p>0.5 (NS) Number of ISs Genome size 0.4 Ecological association 0.47 0.47 Frequency HGT Genome size is the most important variable Lifestyles is a non-significant determinant

  17. The effect of human sedentarisation (Mira et al.,2006) 1) Genomes with many ISs are from prokaryotes associated with humans or domesticated animals and plants. 2) Large intra-genomic IS expansions are recent. Kruskal-Wallis test : p>0.5 (NS) not indirectly directly No evidence that man-related prokaryotes have more Iss.

  18. Genome size explains ˜40% of the variance in IS abundance The smallest the genome, the lower the number but also the lower density of ISs - Selection could favor small genomes : optimal use of resources; the replication time(an increase in genome size caused by IS could be counter-selected) Wilcoxon test : p<0.05 Genomes with fewer ISs, correspond to the slowest growing prokaryotes Density of ISs (/Mb) fast slow Growth - ISs are selected to generate genetic variation : (such selection should be stronger in larger genomes)

  19. One explanation fits well the available data - Selection against transposition in genomes with higher density of deleterious transposition targets tranposition inactivates genes with high probability the total number of essential genes : ˜300 + 200-300 genes are nearly ubiquitous 500 nearly essential genes The abundance of IS elements in genomes could be mostly a question of space for not highly deleterious transposition events

  20. Conclusions High diversity of ISs found within strains or closely related species • The number of ISs evolve so fast, that there is no historical correlation • HGT may be a determinant of the presence of ISs, but not of its abundance • Surprisingly, genome size alone is the best predictor of IS number and density • Selection against transposition in genomes with higher density of deleterious • transposition targets

  21. observed expected % of breakpoints coincide with IS Number of ISs Impacts of IS abundance? IS expansion : - increases the rate of genome rearrangements - increases the number of pseudogenes Bordetella parapertussis Bordetella bronchiseptica O/E R gene/intergene Number of ISs

  22. Acknowledgements E.P.C Rocha A. Danchin Institut Pasteur La Région Ile de France

  23. Examples Pseudomonas syringae syringae Nitrobacter winogradskyi Shigella sonnei = 18 ISs = 117 ISs = 372 ISs 107 IS3 157 IS1 16 IS630 33 IS4 25 IS21 1 IS66 1 IS91 18 IS110 3 IS605 3 IS1111 4 ISAs1 2 ISNCY 37 IS3 32 IS5 27 IS630 2 IS21 14 IS481 4 ISNCY 14 IS3 1 IS5 1 IS630 1 IS66 1 IS110

  24. Association with stability ? Large Repeats decrease genome stability Stability density of repeats (Rocha, Trends Genetics, 03)

  25. But not ISs elements ? Stabiliy Number of ISs

  26. Association with phylogenetic inertia ?  The number of ISs evolve so fast, that there is no historical correlation

  27. lineage loss Two scenarios beneficial agents genomic parasites +IS +IS acquisition +IS +IS expansion -IS deletion

  28. Association with lifestyle ? Burkholderia pseudomallei 36 Facultative pathogen Burkholderia mallei 152 Obligatory pathogen Escherichia coli K12 52 Commensal Shigella flexneri 298 Obligatory pathogen Bordetella bronchiseptica 2 Facultative pathogen Bordetella pertussis 247 Obligatory pathogen -> Link with lifestyle • host restriction, niche change, ..

  29. Association with recent rearrangements ? Bordetella parapertussis Bordetella parapertussis observed expected Bordetella bronchiseptica Bordetella bronchiseptica % of breakpoints coincide with IS Yersinia pestis Yersinia pestis Yersinia pseudotuberculosis Yersinia pseudotuberculosis Number of ISs  IS expansion promoted frequent genomic rearrangements

  30. Association with recent rearrangements ? 99% similarity 99% similarity B. pertussis 32 ISs 247 ISs Bordetella parapertussis B. bronchiseptica B. bronchiseptica 99% similarity 99% similarity 90% similarity S. enterica enterica serovar thyphi Shigella flexeneri S. Enterica typhymurium E. coli K12 S. enterica typhymurium E. coli K12  IS expansion increases the rate of genome rearrangements

  31. A B A B Or1’ Or1’ Or1 Or1 Intergenic region Or2’ IS Or2’ Or2 Or2 Association with pseudogenes ? Number of ISs in genes Number of ISs in intergenes A B Or1 Or1’ Or2’ IS Or2

  32. Association with pseudogenes ? R pseudo = Number of ISs in genes ----------------------------- Number of ISs in intergenes O/E Rpseudo Number of ISs IS expansion increases the number of pseudogenes

  33. Conclusions High variability : - of the number of ISs / Genome - of the number of ISs families / Genome - of the number of ISs copies / Family IS have been recenlty acquired (HGT) IS expansion : - is associated with lifestyle/niche change - increases the rate of genome rearrangements - increases the number of pseudogenes +IS -IS deletion acquisition +IS expansion lineage loss

  34. Conclusions ISs are frequent but not all ubiquitous ISs number and families varie a lot Lack of association of the stability with the number of ISs The presence of ISs is associated with lifestyle beneficial agents IS expansion increases the rate of genome rearrangements IS expansion increases the number of pseudogenes genomic parasites

  35. How many IS ? Number of Genomes Number of Genomes Number of ISs Number of Genomes High variability of the number of ISs / Genome of the number of ISs families / Genome Number of Genomes Number of ISs families Number of ISs families

  36. Log(Number of ISs/Genome) ISs families How many IS ? B. pertussis S. sonnei 16 : IS110 229 : IS481 157 : IS1 106 : IS3 33 : IS4 25 : IS21 S. flexneri 112-108 : IS1 126-124 : IS3 34-22 : IS4 Number of ISs Number of ISs families High variability of the number of ISs families / Genome of the number of ISs / Family

  37. Hypothesis I IS induce short spikes of instability which are averaged out in a deep phylogenetic analysis

  38. Hypothesis II Invasions of highly replicative IS lead to deleterious instability and lineage loss

More Related