420 likes | 547 Views
Molecular Phylogeny in a context of possible Lateral Gene Transfers. Eric Bapteste. W.F. Doolittle Lab. The reason(s) why we doubt a strict tree-like representation should be used. Biological processes favoring lateral exchanges of DNA... are powerful
E N D
Molecular Phylogeny in a context of possible Lateral Gene Transfers Eric Bapteste W.F. Doolittle Lab
The reason(s) why we doubt a strict tree-like representation should be used • Biological processes favoring lateral exchanges of DNA... • are powerful • Phylogenetic evidence for a unique Tree of Life are weak • Molecular phylogenies might even suggest that LGT happens • … at least in some lineages
Intragenomic recombination (legitimate and illegitimate) Mutator phenotype Gene duplication Baseline replication errors (point mutations) Internal source of variation Hypervariable loci Transduction DNA viruses lytic RNA viruses retroviruses Cell fusions Horizontal Vertical inheritance inheritance Transformation External source Genome of the Genome of the DNA from of variation Organism Descendent divergent lineage membrane Deletion of genetic vesicle transfer material (Gene loss) Conjugative plasmids and transposons Conjugation Biological Processes contribute to lateral exchanges of DNA
Phylogenetic evidence for a unique Tree of Life are weak Gamma-proteobacteria: an apparent agreement on a tree “The general lack of conflict observed among the 203 remaining families was not due to the absence of phylogenetic signal in the gene alignments because most genes did conflict with several other topologies (see Figure 3). We interpreted this congruence as a reflection of shared history and a lack of LGT. Therefore, we chose these genes as the basis for inferring the true organismal phylogeny for these 13 species.” Lerat E et al., PLoS Biol. 2003 Oct;1(1):E19.
200 SH test 160 Number of alignments 120 1 80 40 Topologies 0 Blue : non different from the ML tree (5%) 200 AU test 180 160 140 120 100 Red: different from the ML tree 80 60 40 20 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 Phylogenetic evidence for a unique Tree of Life are weak
Testing the congruence/conflict between markers Topologies 0 0,2 42 13 4 1 R²=0,9 2 0,1 0 37 9 4,7 R²=0,05 3 4,1 3,3 0 0,4 7,8 Genes ….. 4 0,1 1,7 27 13 5,4 5 0,6 0,1 33 17 4,3 Principal Component Analysis of p-values for each gene and topology 6 0,4 0,4 29 22 3,1 7 0,3 0,5 41 47 9,7 2 1 6 7 4 5 3
Principal Component Analysis of 205 genes of gamma-proteobacteria and simulated markers with transfers 1.5 1 0.5 0 -2 -1.5 -1 -0.5 0 0.5 1 -0.5 -1 -1.5 genes 1 LGT event
Principal Component Analysis of 205 genes of gamma-proteobacteria and simulated markers with transfers 1.5 1 0.5 0 -2 -1.5 -1 -0.5 0 0.5 1 -0.5 -1 -1.5 genes 1 LGT event 2 LGT events
Principal Component Analysis of 205 genes of gamma-proteobacteria and simulated markers with transfers 1.5 1 0.5 0 -2 -1.5 -0.5 0 0.5 1 -0.5 -1 genes -1.5 1 LGT event 2 LGT events 3 LGT events -1
Principal Component Analysis of 205 genes of gamma-proteobacteria and simulated markers with transfers 1.5 1 0.5 0 -2 -1.5 -1 -0.5 0 0.5 1 -0.5 -1 genes -1.5 Random 1 LGT event 2 LGT events 3 LGT events
TOPOLOGIES NUMBER i GENE NUMBER i P-value
TOPOLOGIES GENES
CLUSTER OF GENES CLUSTER OF PLAUSIBLE TOPOLOGIES BLUE: Genes with LGT RED: genes
CLUSTER OF GENES CLUSTER OF PLAUSIBLE TOPOLOGIES
CLUSTER OF GENES CLUSTER OF PLAUSIBLE TOPOLOGIES
4 2 1 3
INCONGRUENCE OF ORTHOLOGOUS GENES: HOW MUCH IS NOISE, HOW MUCH IS TRANSFER (ORTHOLOGOUS REPLACEMENT)? TRUTH IS, NO ONE REALLY KNOWS genes clearly showing vertical descent genes clearly showing lateral transfer genes showing nothing clearly enthusiastic lateralists committed verticalists
What we propose to do A synthesis Vertical part Horizontal part
Principles to make a synthesis 99 99 Synthesis 99 A B C D Phylogeny of gene 2 Phylogeny of gene 1 E F B A Reference phylogeny A B C F D C 99 E D F E 99 A B C D E F
… to a synthesis ML Trees BV > 50 strict consensus
Conclusions We need better trees to have better synthesis LGT should be accounted for when reconstructing the evolutionary history Many interesting biological and epistemological avenues to explore in the near future
Many thanks to The Doolittle and Roger labs Topology Dave MacLeod Jessica Leigh Celine Brochier Ford Doolittle Robert Charlebois Yan Boucher Ed Susko David Walsh
I respect ( and more) Vincent Daubin • The reason why my interpretation of the dataset is different : • I believe that these most of these genes do not contain enough phylogenetic signal • to tell the whole history of gamma proteobacteria alone • This is the very motive for concatenation: genes are too weak alone • However, based on biological evidence, transfer could have happened, • so we should not prejudge that these genes with a unknown history have been • transmitted only vertically. In context of LGT, concatenation is not safe a priori. • In other words, in the possible presence of LGT, • « when we do not know, we do not know! » • Test concatenations of markers of entirely simulated data, full of transfers, • also gives robust phylogenies (Douady and Doolittle, unpublished) • So, even a good support for a tree coming from a concatenation is no garantee • that the true history has been recovered. Careful analyses of each marker are required. • - During these analyses, if we also see some conflict. • We should show it, and then do a synthesis instead of a tree
The phylogenetic signal is not robust over the whole Synthesis: basal branches are poorly supported. Longest consecutive vertical path supported Distribution of the phylogenetic signal along the synthesis 1 Total phylogenetic signal 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 distance from the root 7 6 5 2 4 3
More precisely, many inner nodes are only supported by a minority of the genes (in purple). There are always genes (in dark green) for which we ignore their phylogenetic history. Horizontal and vertical inheritance Mode of transmission unknown 200 150 100 50 0 1A 6A 2A 3A 5B 6B 4A 4B 5A 1B 2B Xaxo Paer Xfast Wigg Hinfl Ecoli Styphi Pmult Vchol Baphi Xcamp YpesKIM YpesCO92
Xfast Xaxo Xcamp Paer Wigg Baphi Vchol Pmult Hinfl YpesCO92 YpesKIM Styphi A brief view of the differences between the 16 plausible topologies (AU test, 5%) Ecoli
What are the main evolutionary routes? Genes Genes The road of relationships Are there main routes? Unique routes? Side-issues? Are the genes involved in LGT especially mobile ones?
Average p-value (AU test) for each topology over all the genes 0.5 Concatenate tree 0.45 0.4 The average p-value of the best tree for each gene is: 0.83 0.35 0.3 0.25 au 0.2 0.15 0.1 0.05 0 The concatenate tree is a good “average”, but for most genes is not the best tree
Electric network Railway network Maritime network Crystal Web NEW QUESTIONS: Optimisation, functionality, economy, shorter paths ?
70 Bacteria Number of Vertical branches visible: 56 Number of Lateral transfers visible: 4 Genes which were Laterally transferred: 0rpl4_1.puz_bip.out, 0efg_1.puz_bip.out, 0rpl18_1.puz_bip.out, 0fmt_1.puz_bip.out Ratio of compatible LGT (single Arrow) versus total LGT: 1.0 * Possibility of Generalizations * |vert + horz | vert | horz links supported by 0 genes |11 |11 | n/a links supported by 1 gene |6 |2 | 4 links supported by >1 gene |44 |44 | 0 * Gene Mobility Summary (how many LGT events are present per gene) * # events | # genes having # events | gene names 1 | 4 | 0efg_1.puz_bip.out, 0fmt_1.puz_bip.out, 0rpl18_1.puz_bip.out, 0rpl4_1.puz_bip.out
Archaea 70 Number of Vertical branches visible: 44 Number of Lateral transfers visible: 1 Horizontal/Vertical normalized ratio (HVNR): 1.194494E-1 Genes which were Laterally transferred: a0l37aen.puz_bip.out Ratio of compatible LGT (single Arrow) versus total LGT: 1.0 * Possibility of Generalizations * |vert + horz | vert | horz links supported by 0 genes |9 |9 | n/a links supported by 1 gene |4 |3 | 1 links supported by >1 gene |33 |33 | 0 * Gene Mobility Summary (how many LGT events are present per gene) * # events | # genes having # events | gene names 1 | 1 | a0l37aen.puz_bip.out
Euka 70 Number of Vertical branches visible: 33 Number of Lateral transfers visible: 1 Horizontal/Vertical normalized ratio (HVNR): 1.500456E-1 Genes which were Laterally transferred: 0rps23.puz_bip.out Ratio of compatible LGT (single Arrow) versus total LGT: 1.0 * Possibility of Generalizations * |vert + horz | vert | horz links supported by 0 genes |8 |8 | n/a links supported by 1 gene |3 |2 | 1 links supported by >1 gene |24 |24 | 0 * Gene Mobility Summary (how many LGT events are present per gene) * # events | # genes having # events | gene names 1 | 1 | 0rps23.puz_bip.out
Chloro 70 Number of Vertical branches visible: 23 Number of Lateral transfers visible: 2 Horizontal/Vertical normalized ratio (HVNR): 2.192235E-1 Genes which were Laterally transferred: psbh_0.puz_bip.out, psbk_0.puz_bip.out, psac_0.puz_bip.out, psbc_0.puz_bip.out, psbd_0.puz_bip.out Ratio of compatible LGT (single Arrow) versus total LGT: 1.0 * Possibility of Generalizations * |vert + horz | vert | horz links supported by 0 genes |7 |7 | n/a links supported by 1 gene |2 |1 | 1 links supported by >1 gene |17 |16 | 1 * Gene Mobility Summary (how many LGT events are present per gene) * # events | # genes having # events | gene names 1 | 5 | psac_0.puz_bip.out, psbc_0.puz_bip.out, psbd_0.puz_bip.out, psbh_0.puz_bip.out, psbk_0.puz_bip.out
Strict Consensus BV > 50 % Genes which were Laterally transferred: gp25boocon.txt.out, gp46boocon.txt.out These two events of transfers make a support for two phylogenetic relationships: the last common ancestor of (133, rb69 and T4) would have given two genes to the last common ancestor of 25, 31, and 44RR
“A radical departure from conventional thinking” W. Martin/M. Embley “A radical departure from thinking?” Me crazy, but on the shoulders of many philosophers: Leibniz, Whitehead, Deleuze, Parrochia, etc.
ROOT OF THE RING Rivera and Lake, Nature, 2004
Y1 16.8% 60.5% 10% Y1 Y1 Y2 Y2 B E P 7.2% E P 1.8% M B H M H Y1 Y2 P Y2 E P Y2 Y1 E B P B B E H M H M H M
Y1 Y2 96.3 P 1.8 E 7.2 B 10 H 16.8 79.1 77.7 94.5 M 96.3 Unknown Descendent 16.8 10 Unknown Descendent H 96.3 M B 7.2 E 1.8 16.8 P 10 Y1 77.7 79.1 89.1 96.3 Y2 10 Unknown descendent 16.8 Unknown descendent
CLUSTER OF GENES CLUSTER OF PLAUSIBLE TOPOLOGIES
Heuristic of the synthesis... We can question: -the choice of the drawing of evolution -if a non-tree like null hypothesis should not be considered to build evolutionary scenarios There are 26 vertical branches and 11 lateral branches The total vertical thickness is about 13 times more important than the total horizontal thickness Yet, 18 genes were laterally transferred 8 lateral branches are mostly compatible with the reference tree 3 lateral branches are mostly incompatible with the reference tree Thus, 72.7% of LGT are mostly compatible with the reference tree