380 likes | 505 Views
Causes of insertion sequences abundance in prokaryotic genomes? A problem of size. Marie Touchon E.P.C Rocha Atelier de BioInformatique, Université Pierre et Marie Curie, Paris Unité Génétique des Génomes Bactériens, Institut Pasteur, Paris mtouchon@pasteur.fr. IS elements :
E N D
Causes of insertion sequences abundance in prokaryotic genomes? A problem of size Marie Touchon E.P.C Rocha Atelier de BioInformatique, Université Pierre et Marie Curie, Paris Unité Génétique des Génomes Bactériens, Institut Pasteur, Paris mtouchon@pasteur.fr
IS elements : the simplest form of transposable elements - 700 to 2500 bp - coding only the information allowing their mobility ability to generate mutations : - by insertion within genes - by activate genes on insertion upstream - to generate extensive DNA rearrangements have been found to shuttle the transfer of adaptive traits such as : - antibiotic resistance - virulence - new metabolic capabilities Their exact nature is still debated : Selfish/Advantageous? - genomic parasites - beneficial agents
Causes of insertion sequences abundance in prokaryotic genome ? Reasons largely unknown and widely speculated Hypotheses : - IS family specificity - Genome size - Frequency of horizontal gene transfer - Pathogenicity - Type of ecological associations - Human sedentarisation The current availability of hundreds of genomes renders testable many of these hypotheses.
IS elements Identification : Problem : ISs annotations are heterogeneous, inaccurate or insufficient Solution : Reannotation of ISs using comparative study by adopting the nomenclature defined by Chandler (1998) - ISs have one or two consecutive ORFs encoding transposase protein - ISs are grouped into 21 distinct families
ISs Reannotation (1)ISs CDS Detection All annotated CDS Genome x ISs Database Chandler et al. IS1A-IS1B IS1A-IS21A-IS21B-IS1B IS1A-IS3A-IS3B-IS1A (2)IS elements reconstitution IS1 IS1 IS1 IS21 IS3 (3)ISs complete or partial ISs fragments (> 20% of difference length) ISs with internal insertion Partial elements
ISs Reannotation - Reassessment (1) 262 genomes Shigella flexneri Annotated ISs CDS Decteted ISs CDS Y = 0.77 (0.02) X + 5.86 ( 1.89) R2 = 0.81 (P< 0.0001) R = 0.95 (P< 0.0001) 1194 (11%) 8823 (89%) 2115 (22%) Number of Detected ISs CDS (2) 8123 ISs elements 83% are complete (may be active) Number of Annotated ISs CDS (3) Only 20% (1994) of Genbank ISs had a consistent classification
Distribution of ISs in 262 genomes Sulfolobus solfactaricus (archaebacteria) Bacillus haludorans (firmicute) Nitrobacter winogradskyi ( proteobacteria) Bordetella pertussis ( proteobacteria) Number of Genomes Shigella sonnei ( proteobacteria) The absence of ISs is not anecdotic 24% genomes lack IS 48% genomes [0-10] ISs Number of ISs High variability of the number of ISs / Genome of the number of ISs families / Genome Number of ISs families
Association with phylogenetic inertia Rapid dynamic of gain and loss The number of ISs evolve so fast, that there is no historical correlation
The effect of IS family specificity Firmicute ; Proteo ; Proteo 100% Entero 90% Incongruent phylogenetic trees High diversity of ISs found within strains or closely related species
The effect of IS family specificity : Examples Pseudomonas syringae tomato Pseudomonas syringae syringae Pseudomonas syringae pv. phaseolicola 10 IS3 42 IS5 23 IS21 40 IS66 10 IS1111 13 ISNCY 1 IS91 14 IS3 1 IS5 1 IS66 1 IS110 1 IS630 7 IS3 43 IS5 7 IS21 2 IS66 1 IS1111 1 ISNCY 3 IS91 52 IS256 + + = 139 ISs = 18 ISs = 116 ISs This effect is unlikely to explain the variability of ISs
The effect of genome size Wilcoxon test : p<0.0001 Spearman’s r=0.63, p<0.0001 N= 64 198 Strong association between Genome size and IS number (and density) The larger the genome, the more IS elements it contains
Strain Specific region Prophage-Database(Nestle, Casjeans, 2003) HGT-Database(Garcia-Vallve,2003) i j A B Lists of orthologs Strain A B C Strain A specific region The effect of horizontal gene transfer Putative orthologs: Reciprocal best hits, proteins with >90% similarity and <20% length difference. Strain specific region: Exclusive region to a strain which presented at least ten consecutive genes without an orthologs E. Coli O157:H7 Sakai
The effect of horizontal gene transfer Spearman’s r= 0.31 p>0.1 (NS) Wilcoxon test : p<0.0001 t-test : p<0.001 11.4% 5.2% Genomes lacking ISs have fewer HGT ISs are ~ 4 times more concentrated in HGT regions HGT may be a determinant of the presence of ISs, but not of its abundance
The effect of horizontal gene transfer Spearman’s r=0.84, p<0.0001 IS families diversity in HGT regions is almost as high as in the entire genome HGT is a necessary but not sufficient condition to the presence of ISs The intensity of HGT is not a significant determinant of the IS abundance
The effect of pathogenicity Yersinia pestis (plague) Shigella flexneri, sonnei (dysentery) Bordetella pertussis (whooping cough) Wilcoxon test : p<0.001 Wilcoxon test : p>0.5 3.6 4.3 N = 100 153 IS=0 8% 17% 55% 100% No association between the presence of IS and pathogenicity Strong association between the frequency of IS and the facultative character of the ecological associations
The effect of the type of ecological association Stepwise multiple regression We removed genomes lacking IS (possibly under sexual isolation) Covariate Cumulative R2 Kruskal-Wallis test : p>0.5 (NS) Number of ISs Genome size 0.4 Ecological association 0.47 0.47 Frequency HGT Genome size is the most important variable Lifestyles is a non-significant determinant
The effect of human sedentarisation (Mira et al.,2006) 1) Genomes with many ISs are from prokaryotes associated with humans or domesticated animals and plants. 2) Large intra-genomic IS expansions are recent. Kruskal-Wallis test : p>0.5 (NS) not indirectly directly No evidence that man-related prokaryotes have more Iss.
Genome size explains ˜40% of the variance in IS abundance The smallest the genome, the lower the number but also the lower density of ISs - Selection could favor small genomes : optimal use of resources; the replication time(an increase in genome size caused by IS could be counter-selected) Wilcoxon test : p<0.05 Genomes with fewer ISs, correspond to the slowest growing prokaryotes Density of ISs (/Mb) fast slow Growth - ISs are selected to generate genetic variation : (such selection should be stronger in larger genomes)
One explanation fits well the available data - Selection against transposition in genomes with higher density of deleterious transposition targets tranposition inactivates genes with high probability the total number of essential genes : ˜300 + 200-300 genes are nearly ubiquitous 500 nearly essential genes The abundance of IS elements in genomes could be mostly a question of space for not highly deleterious transposition events
Conclusions High diversity of ISs found within strains or closely related species • The number of ISs evolve so fast, that there is no historical correlation • HGT may be a determinant of the presence of ISs, but not of its abundance • Surprisingly, genome size alone is the best predictor of IS number and density • Selection against transposition in genomes with higher density of deleterious • transposition targets
observed expected % of breakpoints coincide with IS Number of ISs Impacts of IS abundance? IS expansion : - increases the rate of genome rearrangements - increases the number of pseudogenes Bordetella parapertussis Bordetella bronchiseptica O/E R gene/intergene Number of ISs
Acknowledgements E.P.C Rocha A. Danchin Institut Pasteur La Région Ile de France
Examples Pseudomonas syringae syringae Nitrobacter winogradskyi Shigella sonnei = 18 ISs = 117 ISs = 372 ISs 107 IS3 157 IS1 16 IS630 33 IS4 25 IS21 1 IS66 1 IS91 18 IS110 3 IS605 3 IS1111 4 ISAs1 2 ISNCY 37 IS3 32 IS5 27 IS630 2 IS21 14 IS481 4 ISNCY 14 IS3 1 IS5 1 IS630 1 IS66 1 IS110
Association with stability ? Large Repeats decrease genome stability Stability density of repeats (Rocha, Trends Genetics, 03)
But not ISs elements ? Stabiliy Number of ISs
Association with phylogenetic inertia ? The number of ISs evolve so fast, that there is no historical correlation
lineage loss Two scenarios beneficial agents genomic parasites +IS +IS acquisition +IS +IS expansion -IS deletion
Association with lifestyle ? Burkholderia pseudomallei 36 Facultative pathogen Burkholderia mallei 152 Obligatory pathogen Escherichia coli K12 52 Commensal Shigella flexneri 298 Obligatory pathogen Bordetella bronchiseptica 2 Facultative pathogen Bordetella pertussis 247 Obligatory pathogen -> Link with lifestyle • host restriction, niche change, ..
Association with recent rearrangements ? Bordetella parapertussis Bordetella parapertussis observed expected Bordetella bronchiseptica Bordetella bronchiseptica % of breakpoints coincide with IS Yersinia pestis Yersinia pestis Yersinia pseudotuberculosis Yersinia pseudotuberculosis Number of ISs IS expansion promoted frequent genomic rearrangements
Association with recent rearrangements ? 99% similarity 99% similarity B. pertussis 32 ISs 247 ISs Bordetella parapertussis B. bronchiseptica B. bronchiseptica 99% similarity 99% similarity 90% similarity S. enterica enterica serovar thyphi Shigella flexeneri S. Enterica typhymurium E. coli K12 S. enterica typhymurium E. coli K12 IS expansion increases the rate of genome rearrangements
A B A B Or1’ Or1’ Or1 Or1 Intergenic region Or2’ IS Or2’ Or2 Or2 Association with pseudogenes ? Number of ISs in genes Number of ISs in intergenes A B Or1 Or1’ Or2’ IS Or2
Association with pseudogenes ? R pseudo = Number of ISs in genes ----------------------------- Number of ISs in intergenes O/E Rpseudo Number of ISs IS expansion increases the number of pseudogenes
Conclusions High variability : - of the number of ISs / Genome - of the number of ISs families / Genome - of the number of ISs copies / Family IS have been recenlty acquired (HGT) IS expansion : - is associated with lifestyle/niche change - increases the rate of genome rearrangements - increases the number of pseudogenes +IS -IS deletion acquisition +IS expansion lineage loss
Conclusions ISs are frequent but not all ubiquitous ISs number and families varie a lot Lack of association of the stability with the number of ISs The presence of ISs is associated with lifestyle beneficial agents IS expansion increases the rate of genome rearrangements IS expansion increases the number of pseudogenes genomic parasites
How many IS ? Number of Genomes Number of Genomes Number of ISs Number of Genomes High variability of the number of ISs / Genome of the number of ISs families / Genome Number of Genomes Number of ISs families Number of ISs families
Log(Number of ISs/Genome) ISs families How many IS ? B. pertussis S. sonnei 16 : IS110 229 : IS481 157 : IS1 106 : IS3 33 : IS4 25 : IS21 S. flexneri 112-108 : IS1 126-124 : IS3 34-22 : IS4 Number of ISs Number of ISs families High variability of the number of ISs families / Genome of the number of ISs / Family
Hypothesis I IS induce short spikes of instability which are averaged out in a deep phylogenetic analysis
Hypothesis II Invasions of highly replicative IS lead to deleterious instability and lineage loss