1 / 15

Alexis Dereeper

Homology analysis and molecular phylogeny. Alexis Dereeper. CIBA courses – Brasil 2011. Data selection. 4 steps for a phylogenetic analysis. 4. 2. 1. 3. Sequence alignment. Distance methods. Probabilistic methods. Method selection. Bayesian. Maximum likelihood. Parsimony.

faolan
Download Presentation

Alexis Dereeper

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Homology analysis and molecular phylogeny Alexis Dereeper CIBA courses – Brasil 2011

  2. Data selection 4 steps for a phylogenetic analysis 4 2 1 3 Sequence alignment Distance methods Probabilistic methods Method selection Bayesian Maximum likelihood Parsimony Calculate distance Optimization Model? Calculate or estimate the better tree fitting the data Test the reliability of the obtained tree Alexis Dereeper CIBA courses – Brasil 2011

  3. Phylogeny.fr “The Phylogeny.fr platform transparently chains programs to automatically perform phylogenetic analysis tasks” Alexis Dereeper CIBA courses – Brasil 2011

  4. Homology analysis What is sequence homology? • Not a quantitative concept (to differentiate to similarity or identity : 28%identity): • genes are homologous or not • Homologs: genes coming from a common ancestor • Paralogs: homologs coming from a duplication event • Orthologs: homologs coming from a speciation event • Homology and function: homology does not mean same function systematically. Closest orthologs may have the same function but more distant orthologs show rarely the same phenotypic role (but same role in a specific metabolic pathway) On the other hand, paralogs rapidly acquire different functions. Alexis Dereeper CIBA courses – Brasil 2011

  5. Homology analysis How are homologous sequences similar? • From 100% identity to a few nt/aa in common • No rule, no limit. Estimation is based on the probability that 2 sequences are similar by chance (e-value): • DNA: e-value < 10-6 et identity > 70% • Protein: e-value < 10-3 et identity > 25% • Sequences without noticeable resemblance can be homologous (similarity found at the 3D structure level). • Otherwise, a important resemblance is generally interpreted as a homology, and not as a convergent evolution Alexis Dereeper CIBA courses – Brasil 2011

  6. Homology analysis How to detect homology? • By sequence comparison= sequence alignment • 1- Local alignment (ex:Blast) • Conceived to search for similar regions • Alignment of a particular sequence against a bank of sequences • (Swith &Waterman) • 2- Global alignment (ex: ClustalW) • Conceived to compare homologous sequences on their full length • (Needleman & Wunsh) Alexis Dereeper CIBA courses – Brasil 2011

  7. Homology analysis Classical Blast output Evalue= inform the accuracy of score score Different Blast programs : • BlastN (Query: DNA / Subject : DNA) • BlastP (Query: protein/ Subject: protein) • BlastX (Query: DNA / Subject: protein) • TBlastN (Query: protein/ Subject: DNA) • TBlastX (Query: translated DNA / Subject: translated DNA) Alexis Dereeper CIBA courses – Brasil 2011

  8. Blast Explorer • Enable an assisted selection of homologous sequences using various criterias • Post-processing of Blast results: • Guide tree (similarity tree) and possible selection on branches and leaves • Score / evalue distribution • Taxonomic arborescence of hits Alexis Dereeper CIBA courses – Brasil 2011

  9. Homology analysis BBMH method (Best Blast Mutual Hits) ou RBH (Reciprocal Best Hit) Proteome Species1 Proteome Species2 Ortholog databases/banks: • Inparanoid (eukaryotes) • HomoloGene (eukaryotes) • OrthoMCL DB • COG (Clusters of Ortholog Groups of proteins) (prokaryotes et eukaryotes) • GreenPhyl (plants) Alexis Dereeper CIBA courses – Brasil 2011

  10. Phylogenetic analysis Step 1 : Multiple alignment (global alignment) • Alignment softwares: • ClustalW • Muscle • Tcoffee • 3DCoffee (optimize the alignment with 3D structure) • Mafft • Alignment formats : Fasta, Clustal, Phylip, Nexus • Alignment visualization/edition softwares • SeaView • Jalview • BioEdit fast slow Alexis Dereeper CIBA courses – Brasil 2011

  11. Phylogenetic analysis Step 2 : Alignment cleaning • Removal of divergent regions showing a low phylogenetic signal (not very informative) • These regions may not be homologous or may have been saturated by substitutions • (ex: synonymous sites in coding regions) => Cleaned alignment more suitable for a phylogenetic analysis • Alignment curation software • GBlocks Alexis Dereeper CIBA courses – Brasil 2011

  12. Phylogenetic analysis Step 3 : Phylogenetic reconstruction • Step 3a: Choose a method for phylogenetic reconstruction • 4 main methods/algorithms: • Distance method 2 by 2 (UPGMA, Neighbor Joining) • FastDist, BIONJ, Neighbor • Maximum parsimony • DNAPars, TNT • Maximum likelihood • PhyML, PAML • Bayesian inference • MrBayes, Beast • Output format : distance matrix, Newick format Choose the correct compromise between speed and performance Alexis Dereeper CIBA courses – Brasil 2011

  13. Phylogenetic analysis Step 3 : Phylogenetic reconstruction • Step 3b: Choose parameters and evolution models • Different evolution models indicating the substitution rate for aa or nt: • DNA • Juke Cantor, Kimura, F81, HKY85, GTR • protein • JTT, WAG, Dayhoff • Evolution test softwares: Test and selection of the best substitution model (and parameters) adapted to dataset (having the maximum likelihood) • ProtTest, ModelTest (based on PhyML) Alexis Dereeper CIBA courses – Brasil 2011

  14. Phylogenetic analysis Step 3 : Phylogenetic reconstruction • Step 3c: Estimate the branch robustness • Bootstrap procedure • 1- Re-sampling of sequences on columns : creation of a pseudo-alignment by taking some sites randomly and tree computing again. • 2- Reiterate the process N times. • 3- For each branch of the initial tree, we count the number of times we can observe it into bootstrap trees. The higher is this number, the more accurate is the branch • aLRT test (approximate Likelihood Ratio Test) (Anisimova & Gascuel, Syst Biol, 2006) • Integrated in PhyML • Much faster (PhyML launched only one time) Alexis Dereeper CIBA courses – Brasil 2011

  15. Phylogenetic analysis Step 4 : Visualization and edition of phylogenetic tree • Graphical tools available to display trees from Newick format : • TreeDyn • DrawGram, DrawTree • ATV • NJPlot • Graphical output formats : PNG, SVG, PDF… Step 5 : Interpretation of the tree Alexis Dereeper CIBA courses – Brasil 2011

More Related