1 / 51

Lecture 5 : Phylogenies

Lecture 5 : Phylogenies. 9/16/09. Translated blast = protein vs translated database. Blasting Genbank - blastn. Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum. AX8GS9DG01S. Blasting Genbank - discont megablast - exactly same as blastn.

danae
Download Presentation

Lecture 5 : Phylogenies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 5 : Phylogenies 9/16/09

  2. Translated blast = protein vs translated database

  3. Blasting Genbank - blastn Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum AX8GS9DG01S

  4. Blasting Genbank - discont megablast - exactly same as blastn Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum AX9N23U7014

  5. Blasting Genbank - megablast - same species but different order Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum AX9TUM1G016

  6. AX9DYYTE01N Blasting Genbank - Tblastn T. aculeatus - echidna S. brachyurus - quokka S. crassicaudata - fat tailed dunnart M. fasciatus - numbat I. obesulus - quenda

  7. S. brachyurus = quokka I. obesulus = quenda = bandicoot Z. bruijni - Long beaked echidna T. rostratus = honey possum M. fasciatus = numbat S. crassicaudata = fat tailed dunnart Species found by BLAST O. anatinus = platypus T. aculeatus = echidna

  8. Homologene - can be reached from NCBI home page Scroll down - they are listed alphabetically

  9. Questions Phylogenies - what are they? • How do we build them? • What do they tell us?

  10. Phylogeny • Evolutionary history of a a group of organisms, especially as depicted in a family tree Haeckel, 1879

  11. Things trees might tell you : • How are organisms with particular trait related? • Did trait evolve multiple times or only once? • What is evolutionary pathway • Of organisms • Of genes

  12. Molecules can be used to learn how organisms are related

  13. To learn about vertebrate evolution: Compare >600 genes 1998

  14. Used genes to measure time Time since common ancestor with human Time since two groups diverged

  15. More recent version of vertebrate evolution which shows divergence times on the animal tree Ponting 2008

  16. Orangutan Human Chimp Rhesus monkey Mouse Rat Dog Cat Horse Cow Opposum Wallaby Platypus Anole Chicken Frog Fish -Medaka Fugu Tetraodon Zebrafish Elephant shark Lamprey

  17. Primates 25 MY Mammals 100 MY Tetrapods 420 MY Fish 320 MY All vertebrates 550 MY

  18. Molecular clock • Molecules change at a steady rate • We can calibrate how fast they change using fossils • The molecules then become a time piece to measure how recently different groups split off from each other

  19. Sequence conservation may be high • Gene might code for a protein which is highly constrained • Might have to interact with lots of other proteins • Selection might be quite strong

  20. Sequence conservation may be low • Not much constraint • Few sites of interaction • Selection might be weak

  21. Phylogeny steps • Align sequences so homologous AA can be compared • Determine the similarity between sequences • Use this to generate a relationship between sequences

  22. Clustalw2 to align sequences

  23. Put sequences in FASTA file >TetraodonG1 MVWDGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYPQYYLVDPIM FKMLALYMFFLICTGTPINGLTLLVTAQNKKLRQPLNYILVNLAVAGLIMCAFGFTIT ITSAINGYFILGATACAVEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFTGTH AAVGVLFTWIMAFACAGPPLFGWSRYLPEGMQCSCGPDYYTLAPGYNNESYVIYMFVV HFFVPVFLIFFTYGSLVLTVRAAAQQQESESTQKAQREVTRMCILMVLGFLVAWTPYA TFSGWIFMNKGAAFHPLTAALCAFFAKSSALYNPVIYVLMNKQFRNCMLSTFGMGGAV DDETSVSASKTEVSSVS >ZebrafishG1 MNGTEGSNFYIPMSNRTGLVRSPYDYTQYYLAEPWKFKALAFYMFLLIIFGFPINVLT LVVTAQHKKLRQPLNYILVNLAFAGTIMVIFGFTVSFYCSLVGYMALGPLGCVMEGFF ATLGGQVALWSLVVLAIERYIVVCKPMGSFKFSANHAMAGIAFTWFMACSCAVPPLFG WSRYLPEGMQTSCGPDYYTLNPEYNNESYVMYMFSCHFCIPVTTIFFTYGSLVCTVKA AAAQQQESESTQKAEREVTRMVILMVLGFLFAWVPYASFAAWIFFNRGAAFSAQAMAV PAFFSKTSAVFNPIIYVLLNKQFRSCMLNTLFCGKSPLGDDESSSVSTSKTEVSSVSPA >CichlidG1 MAWEGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYTQYYLADPIFFKLLAFYMFFLICT GTPINSLTLFVTAQNKKLRQPLNYILVNLAVAGLIMCCFGFTITITSAFNGYFILGST FCAIEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSGAHAGAGVLFTWIMAMA CAAPPLFGWSRYIPEGMQCSCGPDYYTLAPGFNNESYVIYMFVVHFFVPVFIIFFTYG SLVMTVKAAAAQQQDSASTQKAEKEVTRMCVLMVMGFLIAWTPYASFAGWIFMNKGAS FSALTAAIPAFFAKSSALYNPVIYVLMNKQFRNCMLSTIGMGGMVEDETSVSTSKTEV SSVS

  24. Aligned sequences .aln ; Jalview gives colored version Funky tree .dnd (need special program to draw) Scroll down this page for tree (use Phylogram)

  25. CLUSTAL W (1.83) multiple sequence alignment TetraodonG1 MVWDGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYPQYYLVDPIMFKMLALYMFFLICTGT 60 CichlidG1 MAWEGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYTQYYLADPIFFKLLAFYMFFLICTGT 60 ZebrafishG1 --------MNGTEGSNFYIPMSNRTGLVRSPYDYTQYYLAEPWKFKALAFYMFLLIIFGF 52 *****.***********:****::*.****.:* ** **:***:** * TetraodonG1 PINGLTLLVTAQNKKLRQPLNYILVNLAVAGLIMCAFGFTITITSAINGYFILGATACAV 120 CichlidG1 PINSLTLFVTAQNKKLRQPLNYILVNLAVAGLIMCCFGFTITITSAFNGYFILGSTFCAI 120 ZebrafishG1 PINVLTLVVTAQHKKLRQPLNYILVNLAFAGTIMVIFGFTVSFYCSLVGYMALGPLGCVM 112 *** ***.****:***************.** ** ****::: .:: **: **. *.: TetraodonG1 EGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFTGTHAAVGVLFTWIMAFACAGPPL 180 CichlidG1 EGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSGAHAGAGVLFTWIMAMACAAPPL 180 ZebrafishG1 EGFFATLGGQVALWSLVVLAIERYIVVCKPMGSFKFSANHAMAGIAFTWFMACSCAVPPL 172 ***:*****:**************************:. ** .*: ***:** :** *** TetraodonG1 FGWSRYLPEGMQCSCGPDYYTLAPGYNNESYVIYMFVVHFFVPVFLIFFTYGSLVLTVR- 239 CichlidG1 FGWSRYIPEGMQCSCGPDYYTLAPGFNNESYVIYMFVVHFFVPVFIIFFTYGSLVMTVKA 240 ZebrafishG1 FGWSRYLPEGMQTSCGPDYYTLNPEYNNESYVMYMFSCHFCIPVTTIFFTYGSLVCTVKA 232 ******:***** ********* * :******:*** ** :** ********* **: TetraodonG1 AAAQQQESESTQKAQREVTRMCILMVLGFLVAWTPYATFSGWIFMNKGAAFHPLTAALCA 299 CichlidG1 AAAQQQDSASTQKAEKEVTRMCVLMVMGFLIAWTPYASFAGWIFMNKGASFSALTAAIPA 300 ZebrafishG1 AAAQQQESESTQKAEREVTRMVILMVLGFLFAWVPYASFAAWIFFNRGAAFSAQAMAVPA 292 ******:* *****::***** :***:***.**.***:*:.***:*:**:* . : *: * TetraodonG1 FFAKSSALYNPVIYVLMNKQFRNCMLSTFGMGG--AVDDETS-VSASKTEVSSVS-- 351 CichlidG1 FFAKSSALYNPVIYVLMNKQFRNCMLSTIGMGG--MVEDETS-VSTSKTEVSSVS-- 352 ZebrafishG1 FFSKTSAVFNPIIYVLLNKQFRSCMLNTLFCGKSPLGDDESSSVSTSKTEVSSVSPA 349 **:*:**::**:****:*****.***.*: * :**:* **:*********

  26. Alignment is key • Any other analysis that you do is only as good as your alignment • If your alignment is bad subsequent analyses will be bad • Junk in = Junk out

  27. Alignments • Tell you about sequence conservation • How much is there? • Where is it?

  28. Calculate sequence similarities Zebrafish M--------NGTEGSNFYIPMSNR Trout M------Q-NGTEGSNFYIPMSNR Medaka M------E-NGTEGKNFYIPMNNR Cod M----RMEANGTEGKNFYIPMSNR Halibut MVWDGGIEPNGTEGKNFYIPMSNR Tetraodon MVWDGGIEPNGTEGKNFYIPMSNR Goldfish M--------NGTEGNNFYVPLSNR Killifish M---GYG-PNGTEGNNFYIPMSNK * *****.***:*:.*: Pairwise comparisons

  29. Use tree to show sequence relationships Short branches mean sequences are more similar Long branches mean there are more differences

  30. Q3. How do we build phylogenies? • Assume the relationships involve bifurcating branches ATC ATC ATG ATG ACG ACG CCG CCG CCC CCC

  31. Methods to determine similarities • Parsimony • Distance • Maximum likelihood • Bayesian

  32. Parsimony • The least complex explanation is the most likely to be correct • Occam’s razor • The preferred phylogenetic tree is one that requires fewest changes • Count up # changes for all possible trees • Find the shortest one

  33. CT CT CT Most parsimonious Trees based on parsimony ATCG ATCG ATCG ACCG ACCG ATCG ACCG ACCG

  34. CT CT CT Most parsimonious Trees based on parsimony T T T C C T C C

  35. Can’t always distinguish tree topologies T T CT CT T T C C C C Equally parsimonious

  36. Other limitations • All changes are weighted the same • C-T same as C - A • Same no matter how long it takes for the change to occur

  37. Distance methods • Calculate a numerical value for sequence differences • Do for all pairwise combinations • Build tree by joining most similar sequences and then more divergent

  38. Distance methods • Fast • Pretty robust • Only deals with data in pairs

  39. Pairwise distances • Taxa1 AACGGTCATGGCGTTGCATT • Taxa2 AACGGTCAGGGCGTTGCATT • Taxa3 AACGGTCACGCCGCTGCATT

  40. Distance, d • p is fractional similarity of sequence • Simplest form of distance: d = 1 - p • AACGGTCATGGCGTTGCATT • AACGGTCACGGCGTTGCATT • p = 19/20 d = 0.05

  41. 1 2 3 Tree building • Neighbor joining • Join most similar pair of sequences • Add more divergent after

  42. How different can 2 sequences get? • At infinite time, random probability that two sequences are the same • Probability a base is same = 1/4 • DNA only has 4 bases • Certain sites will start to change multiple times • Need to account for these multiple hits

  43. Random sequences • Write down 20 bases of sequence

  44. Compare your sequence to this one • AGTCCGATTACGGCTAGCAG • What fraction of sites are the same in the two sequences?

  45. Sequence similarity decays to 25% over long times

  46. Sequence difference maxes at 0.75

  47. Sequence change accumulates linearly with time at beginning

  48. DNA models • Use different DNA models to account for how sequences evolve with time • Allows you to apply different molecular clocks • Relate sequence change to time • Clock is not linear except for small changes and short times • Models same as used in maximum likelihood methods

  49. How good is your tree? • Bootstrap approach • Run the same method multiple times • Subsample data each time • Use 50% of data • See how reproducible the trees are • Count how many times a particular grouping occurs

  50. Distance tree for rod and cone transducin alpha subunitBranch lengths are proportional to sequence differences

More Related