150 likes | 298 Views
Homology analysis and molecular phylogeny. Alexis Dereeper. CIBA courses – Brasil 2011. Data selection. 4 steps for a phylogenetic analysis. 4. 2. 1. 3. Sequence alignment. Distance methods. Probabilistic methods. Method selection. Bayesian. Maximum likelihood. Parsimony.
E N D
Homology analysis and molecular phylogeny Alexis Dereeper CIBA courses – Brasil 2011
Data selection 4 steps for a phylogenetic analysis 4 2 1 3 Sequence alignment Distance methods Probabilistic methods Method selection Bayesian Maximum likelihood Parsimony Calculate distance Optimization Model? Calculate or estimate the better tree fitting the data Test the reliability of the obtained tree Alexis Dereeper CIBA courses – Brasil 2011
Phylogeny.fr “The Phylogeny.fr platform transparently chains programs to automatically perform phylogenetic analysis tasks” Alexis Dereeper CIBA courses – Brasil 2011
Homology analysis What is sequence homology? • Not a quantitative concept (to differentiate to similarity or identity : 28%identity): • genes are homologous or not • Homologs: genes coming from a common ancestor • Paralogs: homologs coming from a duplication event • Orthologs: homologs coming from a speciation event • Homology and function: homology does not mean same function systematically. Closest orthologs may have the same function but more distant orthologs show rarely the same phenotypic role (but same role in a specific metabolic pathway) On the other hand, paralogs rapidly acquire different functions. Alexis Dereeper CIBA courses – Brasil 2011
Homology analysis How are homologous sequences similar? • From 100% identity to a few nt/aa in common • No rule, no limit. Estimation is based on the probability that 2 sequences are similar by chance (e-value): • DNA: e-value < 10-6 et identity > 70% • Protein: e-value < 10-3 et identity > 25% • Sequences without noticeable resemblance can be homologous (similarity found at the 3D structure level). • Otherwise, a important resemblance is generally interpreted as a homology, and not as a convergent evolution Alexis Dereeper CIBA courses – Brasil 2011
Homology analysis How to detect homology? • By sequence comparison= sequence alignment • 1- Local alignment (ex:Blast) • Conceived to search for similar regions • Alignment of a particular sequence against a bank of sequences • (Swith &Waterman) • 2- Global alignment (ex: ClustalW) • Conceived to compare homologous sequences on their full length • (Needleman & Wunsh) Alexis Dereeper CIBA courses – Brasil 2011
Homology analysis Classical Blast output Evalue= inform the accuracy of score score Different Blast programs : • BlastN (Query: DNA / Subject : DNA) • BlastP (Query: protein/ Subject: protein) • BlastX (Query: DNA / Subject: protein) • TBlastN (Query: protein/ Subject: DNA) • TBlastX (Query: translated DNA / Subject: translated DNA) Alexis Dereeper CIBA courses – Brasil 2011
Blast Explorer • Enable an assisted selection of homologous sequences using various criterias • Post-processing of Blast results: • Guide tree (similarity tree) and possible selection on branches and leaves • Score / evalue distribution • Taxonomic arborescence of hits Alexis Dereeper CIBA courses – Brasil 2011
Homology analysis BBMH method (Best Blast Mutual Hits) ou RBH (Reciprocal Best Hit) Proteome Species1 Proteome Species2 Ortholog databases/banks: • Inparanoid (eukaryotes) • HomoloGene (eukaryotes) • OrthoMCL DB • COG (Clusters of Ortholog Groups of proteins) (prokaryotes et eukaryotes) • GreenPhyl (plants) Alexis Dereeper CIBA courses – Brasil 2011
Phylogenetic analysis Step 1 : Multiple alignment (global alignment) • Alignment softwares: • ClustalW • Muscle • Tcoffee • 3DCoffee (optimize the alignment with 3D structure) • Mafft • Alignment formats : Fasta, Clustal, Phylip, Nexus • Alignment visualization/edition softwares • SeaView • Jalview • BioEdit fast slow Alexis Dereeper CIBA courses – Brasil 2011
Phylogenetic analysis Step 2 : Alignment cleaning • Removal of divergent regions showing a low phylogenetic signal (not very informative) • These regions may not be homologous or may have been saturated by substitutions • (ex: synonymous sites in coding regions) => Cleaned alignment more suitable for a phylogenetic analysis • Alignment curation software • GBlocks Alexis Dereeper CIBA courses – Brasil 2011
Phylogenetic analysis Step 3 : Phylogenetic reconstruction • Step 3a: Choose a method for phylogenetic reconstruction • 4 main methods/algorithms: • Distance method 2 by 2 (UPGMA, Neighbor Joining) • FastDist, BIONJ, Neighbor • Maximum parsimony • DNAPars, TNT • Maximum likelihood • PhyML, PAML • Bayesian inference • MrBayes, Beast • Output format : distance matrix, Newick format Choose the correct compromise between speed and performance Alexis Dereeper CIBA courses – Brasil 2011
Phylogenetic analysis Step 3 : Phylogenetic reconstruction • Step 3b: Choose parameters and evolution models • Different evolution models indicating the substitution rate for aa or nt: • DNA • Juke Cantor, Kimura, F81, HKY85, GTR • protein • JTT, WAG, Dayhoff • Evolution test softwares: Test and selection of the best substitution model (and parameters) adapted to dataset (having the maximum likelihood) • ProtTest, ModelTest (based on PhyML) Alexis Dereeper CIBA courses – Brasil 2011
Phylogenetic analysis Step 3 : Phylogenetic reconstruction • Step 3c: Estimate the branch robustness • Bootstrap procedure • 1- Re-sampling of sequences on columns : creation of a pseudo-alignment by taking some sites randomly and tree computing again. • 2- Reiterate the process N times. • 3- For each branch of the initial tree, we count the number of times we can observe it into bootstrap trees. The higher is this number, the more accurate is the branch • aLRT test (approximate Likelihood Ratio Test) (Anisimova & Gascuel, Syst Biol, 2006) • Integrated in PhyML • Much faster (PhyML launched only one time) Alexis Dereeper CIBA courses – Brasil 2011
Phylogenetic analysis Step 4 : Visualization and edition of phylogenetic tree • Graphical tools available to display trees from Newick format : • TreeDyn • DrawGram, DrawTree • ATV • NJPlot • Graphical output formats : PNG, SVG, PDF… Step 5 : Interpretation of the tree Alexis Dereeper CIBA courses – Brasil 2011