1 / 85

Phylogenetic Analysis

Phylogenetic Analysis. YTSLLLSRQ-. YASLLW-RQA. PASIILSRQA. GRSIVLTRQM. Phylogenetics. What do I need to do?. Get related sequences of interest. Perform multiple sequence alignments. Edit alignment. Estimate phylogenetic relationships. Interpret results correctly. Phylogenetics.

alena
Download Presentation

Phylogenetic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phylogenetic Analysis YTSLLLSRQ- YASLLW-RQA PASIILSRQA GRSIVLTRQM

  2. Phylogenetics What do I need to do? Get related sequences of interest Perform multiple sequence alignments Edit alignment Estimate phylogenetic relationships Interpret results correctly

  3. Phylogenetics Get related sequences of interest Perform multiple sequence alignments Edit alignment Estimate phylogenetic relationships Interpret results correctly

  4. So you have a sequence…now what? MKILLLCIIFLYYVNAFKNTQKDGVSLQILKKKRSNQVNFLNRKNDYNLIKNKNPSSSLKSTFDDIKKIISKQLSVEEDKIQMNSNFTKDLGADSLDLVELIMALEEKFNVTISDQDALKINTVQDAIDYIEKNNKQ

  5. #1: What is it? Does source organism have it’s own genome database? Unknown/No Yes BLAST@ genome database(GeneDB, PlasmoDB, etc.) BLAST@ Pubmed

  6. Why start with genome-specific database? Genome location/structure Strain variability BLAST Expression data Pathway data

  7. PubMed BLAST

  8. Blastp PubMed BLAST

  9. Protein families – Conserved Domains

  10. BLAST Hits

  11. Downloading sequences – FASTA format

  12. Getting sequences – FASTA format

  13. Saving and editing FASTA files

  14. Phylogenetics Get related sequences of interest Perform multiple sequence alignments Edit alignment Estimate phylogenetic relationships Interpret results correctly

  15. GYTSLLLSRQNED--G G--SLLLSHK-D-HTG Global GYTSLLLSRQNEDG-- --GSLLLSHK-D-HTG Overlap TSLLLSR TSLLLSH Local Pair-wise sequence alignment Smith-Waterman

  16. - Y T S L L L S R Q - Y A S L L W R Q A YTSLLLSRQ YASLLWRQA YTSLLLSRQ- YASLLW-RQA Aligning 2 sequences globally -4 -8 -12 -16 -20 -24 -28 -32 -36 -8 -12 -16 -20 -24 -28 -32 -36 -4 4 -4 2 -12 -16 -20 -24 -28 -32 -36 -8 -12 -4 -8 10 -16 -20 -24 -28 -32 -36 -4 -8 -12 14 -20 -24 -28 -32 -36 -16 -20 -4 -8 -12 -16 18 14 10 -32 -36 -19 -8 -12 -16 -20 14 10 6 -36 -24 -28 -4 -20 -12 -16 -20 -24 -28 15 11 -25 -29 -24 -16 -20 -24 -28 -32 20 -32 16 -36 -26 -25 -34 -25 -35 -28 -28 -32

  17. YTSLLLSRQ- YASLLW-RQA YTSLLLSRQ- YASLLW-RQA PASIILSRQA YTSLLLSRQ- YASLLW-RQA PASIILSRQA GRSIVLTRQM Multiple sequence alignment Progressive Align 2 closest sequences Add in next closest sequence Continue adding…. Hyper dependent on initial matches.

  18. YTTSLLLSRQ-- YATSLLWRQA-- PASIILSRQA-- GRTSIVLTRQMA YTTSLLLSRQ-- YATSLLW-RQ-A PA-SIILSRQ-A GRTSIVLTRQMA Multiple sequence alignment Iterative Initial MSA Score (low) Optimize MSA score Probabilistic methods don’t always generate the same answer

  19. Multiple sequence alignment programs Pair-wise alignment type Global Local ClustalX T-Coffee progressive POA MSA Alignment type HMMs GAs Dialign iterative

  20. Multiple Sequence Alignments POAVIZ – progressive local CLUSTAL – progressive global

  21. Multiple Sequence Alignments POAVIZ – progressive local CLUSTAL – progressive global

  22. POAVIZ

  23. POAVIZ

  24. POAVIZ

  25. Multiple Sequence Alignments POAVIZ – progressive local CLUSTAL – progressive global

  26. CLUSTALX Parameters

  27. CLUSTALX

  28. CLUSTALX – Protein Weight Matrices • 1) BLOSUM (Henikoff). These matrices appear to be the best available for carrying out data base similarity (homology searches). • 2) PAM (Dayhoff). These have been extremely widely used since the late '70s. • 3) GONNET. These matrices were derived using almost the same procedure as the Dayhoff one (above) but are much more up to date and are based on a far larger dataset.

  29. BLOSUM99 ----------------------------------------------------->BLOSUM62 >99% identity >62% identity BLOSUM (BLOck SUbstitution Matrix) BLOSUM62 – Gather proteins with at least 62% identity to obtain actual substitution rates for these proteins Pros Best bet for distantly divergent sequences

  30. PAM1 ------------------------------------------------------------->PAM250 99% identity 20% identity PAM (point accepted mutation) Gather the substitution rates for PAM1 (99% identical sequences) Assuming that those substitution rates are consistent over time…: (# Point mutations / 100 amino acids) Pros Very good for closely related sequences Cons Rare mutations under-represented Substitution rates not constant over time (both are problems for phylogenetic estimation)

  31. CLUSTALX

  32. CLUSTALX - Aligning

  33. CLUSTALX - Aligning

  34. CLUSTALX – Alignment view

  35. CLUSTAL vs POAVIZ (global vs local) POAVIZ CLUSTAL

  36. Phylogenetics Get related sequences of interest Perform multiple sequence alignments Edit alignment Estimate phylogenetic relationships Interpret results correctly

  37. BioEdit – Alignment manipulation Open the “.aln” file

  38. BioEdit – Alignment manipulation “Back colored view” gives more contrast Select “Edit” from the mode dropdown

  39. BioEdit – Alignment manipulation Select “Insert” so that you don’t accidentally lose part of your sequence Then select the unaligned beginning (or end) sequence and delete it….

  40. BioEdit – Alignment manipulation Now save as a different file .fasta

  41. Phylogenetics Get related sequences of interest Perform multiple sequence alignments Edit alignment Estimate phylogenetic relationships Interpret results correctly

  42. Tree terminology root outgroup common ancestor (node, branch point) lineage (branch, edge) branch length B C D E F G A Operational taxonomic units (OTUs, leaves)

  43. Topology 1 B C D E F G A Topology 2 B C E F G D A Topology 3 E F G C D B A monophyletic paraphyletic polyphyletic

  44. A A B B Sequence homology – orthologues and paralogues Ancestral gene duplication A B Last common ancestor speciation Human A Rat A Human B Rat B orthologues orthologues paralogues orthologues paralogues

  45. Methods of estimating phylogenetic relationships Character-based Maximum Parsimony (MP)Distance-based Neighbor-Joining (NJ) Minimum Evolution (ME)Probabilistic Maximum likelihood (ML) Bayesian inference

  46. Taxa1 AAG Taxa2 AAA Taxa3 GGA Taxa4 AGA 1 AAA AAA AAA AAA AGA AAA AAA AAA AAA 1 1 2 1 2 1 1 1 AAG AAA GGA AGA AAG AGA AAA GGA AAG GGA AAA AGA 3 changes required (best tree) 4 changes required 4 changes required Methods of estimating phylogenetic relationships Maximum Parsimony (MP)

  47. Methods of estimating phylogenetic relationships Distance-based Neighbor-Joining (NJ) MethodThe NJ method involves clustering of neighbor species that are joined by one node. It does not evaluate all the possible tree topologies. Not guaranteed to obtain the optimal tree Minimum Evolution (ME) MethodEstimates the total branch length of each topology exhaustively, then chooses the topology with the least total branch length. Time intensive for large numbers of taxa.

  48. Methods of estimating phylogenetic relationships Probabilistic methods Maximum likelihood (ML) Prob ( data | model + tree ) More likely topology found Search all possible topologies to optimize probability

  49. Bayesian inference Prior information Model for selection need both for everyone in the class

More Related