280 likes | 443 Views
Alignments. Why do Alignments?. Detecting Selection. Evolution of Drug Resistance in HIV. Selection on Amino Acid Properties. TreeSAAP (2003) Wu Method ( Sainudiin et al. 2005 ). Alpha-helical tendencies Average number of surrounding residues Beta-structure tendencies Bulkiness
E N D
Alignments • Why do Alignments?
DetectingSelection Evolution of Drug Resistance in HIV
Selection on Amino Acid Properties • TreeSAAP (2003) • Wu Method (Sainudiin et al. 2005)
Alpha-helical tendencies Average number of surrounding residues Beta-structure tendencies Bulkiness Buriedness Chromatographic Index Coil tendencies Composition Compressibility Equilibrium constant (ionization of COOH) Helical contact area Hydropathy Isoelectric point Long-range non-bonded energy Mean r.m.s. fluctuation displacement Molecular volume Molecular weight Normalized consensus hydrophobicity Partial specific volume Polar requirement Polarity Power to be at the C-terminal Power to be at the middle of alpha-helix Power to be at the N-terminal Refractive index Short and medium range non-bonded energy Solvent accessible reduction ratio Surrounding hydrophobicity Thermodynamic transfer hydrophobicity Total non-bonded energy Turn tendencies TreeSAAP Properties
OPSIN: Model System for Molecular Evolution UV IR 400 500 600 700 Wavelength (nm) CRLAKIAMTTVALWFIAWT PYLLINWVGMFARSYLSPV YTIWGYVFAKANAVYNPIV YAISHPKYRAAMEKKLPCL SCKTESDDVSESASTTTSS ENVIRONMENT PHENOTYPE GENOTYPE
Is max Correlated with Ecological Differences? INPUT OUTPUT Detect light not absorbed by the photopigment microscopic thin beam of spectral light INPUT – OUTPUT = pigment absorbance 400 – 700 nm at 1nm intervals
Invertebrate Opsin Evolution Heliconius erato PHYMLamino acid ML tree Heliconius sara Bicyclus anynana Junonia coenia Vanessa cardui Papilio xuthus Rh1 Papilio xuthus Rh3 Pieris rapae Manduca sexta Insect LWS 508-575 nm Galleria mellonella Spodoptera exigua Papilio xuthus Rh2 Osmia rufa Bombus terretsris Apis mellifera Camponotus abdominalis Cataglyphis bombycinus Schistocerca gregaria Sphrodromantis sp. Drosophila melanogaster Rh6 Insect MWS 420-490 nm Drosophila melanogaster Rh1 Calliphora erythrocephala Rh1 Drosophila melanogaster Rh2 Neogonodactylus oerstedii Rh3 Neogonodactylus oerstedii Rh1 Neogonodactylus oerstedii Rh2 Homarus gammarus Neomysis americana Holmesimysis costata Crustacean LWS 496-533 nm Procambarus milleri Orconectes virilis Procambarus clarkii Cambarus ludovicianus Cambarellus schufeldtii Euphausia suberba Mysis relicta sp.IV Archaeomysis grebnitzkii Chelicerate LWS (520) Limulus polyphemus Limulus polyphemus Crustacean MWS (480) Hemigrapsus sanguineus Hemigrapsus sanguineus Camponotus abdominalis Cataglyphis bombycinus Insect UV 345-375nm Apis mellifera Manduca sexta Papilio xuthus Rh5 Drosophila melanogaster Rh4 Drosophila melanogaster Rh3 Apis mellifera Insect BL 430-460nm Schistocerca gregaria Papilio xuthus Rh4 Manduca sexta Drosophila melanogaster Rh5 Loligo pealii Loligo forbesi Cephalopod Rh 480-499nm Loligo subulata Sepia officinalis Todarodes pacificus Enteroctopus dofleini Gallus gallus pineal Anolis carolinensis pineal Bos taurus rhodopsin Homo sapiens melatonin 1A Homo sapiens GPR52 0.1 Thicker branches indicate bootstrap values > 90% Thick branches indicate bootstrap values >
6 4 2 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 -2 6 4 2 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 -2 TMII TMIII TMIV TMV TMVI TMI Coil Tendencies 6 4 2 0 30 60 80 90 100 110 120 130 140 150 160 180 190 210 220 240 250 260 10 20 40 50 70 170 200 230 -2 Compressibility 10 8 Z-score Power to be at mid alpha 6 4 2 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 -2 Refractive Index Amino acid alignment number TreeSAAP
Invertebrate Opsin Evolution Heliconius erato PHYMLamino acid ML tree Heliconius sara Bicyclus anynana Junonia coenia Vanessa cardui Papilio xuthus Rh1 Papilio xuthus Rh3 Pieris rapae Manduca sexta Insect LWS 508-575 nm Galleria mellonella Spodoptera exigua Papilio xuthus Rh2 Osmia rufa Bombus terretsris Apis mellifera Camponotus abdominalis Cataglyphis bombycinus Schistocerca gregaria Sphrodromantis sp. Drosophila melanogaster Rh6 Insect MWS 420-490 nm Drosophila melanogaster Rh1 Calliphora erythrocephala Rh1 Drosophila melanogaster Rh2 Neogonodactylus oerstedii Rh3 Neogonodactylus oerstedii Rh1 Neogonodactylus oerstedii Rh2 Homarus gammarus Neomysis americana Holmesimysis costata Crustacean LWS 496-533 nm Procambarus milleri Orconectes virilis Procambarus clarkii Cambarus ludovicianus Cambarellus schufeldtii Euphausia suberba Mysis relicta sp.IV Archaeomysis grebnitzkii Chelicerate LWS (520) Limulus polyphemus Limulus polyphemus Crustacean MWS (480) Hemigrapsus sanguineus Hemigrapsus sanguineus Camponotus abdominalis Cataglyphis bombycinus Insect UV 345-375nm Apis mellifera Manduca sexta Papilio xuthus Rh5 Drosophila melanogaster Rh4 Drosophila melanogaster Rh3 Apis mellifera Insect BL 430-460nm Schistocerca gregaria Papilio xuthus Rh4 Manduca sexta Drosophila melanogaster Rh5 Loligo pealii Loligo forbesi Cephalopod Rh 480-499nm Loligo subulata Sepia officinalis Todarodes pacificus Enteroctopus dofleini Gallus gallus pineal Anolis carolinensis pineal Bos taurus rhodopsin Homo sapiens melatonin 1A Homo sapiens GPR52 0.1 Thicker branches indicate bootstrap values > 90% Thick branches indicate bootstrap values >
Homology definitions • Homologyis an evolutionary relationship that either exists or does not. It cannot be partial. • An ortholog is a homolog that arose through a speciation event • A paralog is a homolog that arose through a gene duplication event. Paralogs often have divergent function. • Similarity is a measure of the quality of alignment between two sequences. High similarity is evidence for homology. Similar sequences may be orthologs or paralogs.
One More Homology type • Xenology – similarity due to horizontal gene transfer (HGT) • How do you discover this?
Alignment Problem • (Optimal) pairwise alignment consists of considering all possible alignments of two sequences and choosing the optimal one. • Sub-optimal (heuristic) alignment algorithms are also very important: eg BLAST
Key Issues • Types of alignments (local vs. global) • The scoring system • The alignment algorithm • Measuring alignment significance
Types of Alignment • Global—sequences aligned from end-to-end. • Local—alignments may start in the middle of either sequence • Ungapped—no insertions or deletions are allowed • Other types: overlap alignments, repeated match alignments
Local vs. Global Pairwise Alignments • A global alignment includes all elements of the sequences and includes gaps. • A global alignment may or may not include "end gap" penalties. • Global alignments are better indicators of homology and take longer to compute. • A local alignment includes only subsequences, and sometimes is computed without gaps. • Local alignments can find shared domains in divergent proteins and are fast to compute
How do you compare alignments? • Scoring scheme • What events do we score? • Matches • Mismatches • Gaps • What scores will you give these events? • What assumptions are you making? • Score your alignment
Scoring Matrices • How do you determine scores? • What is out there already for your use? • DNA versus Amino Acids? • TTACGGAGCTTC • CTGAGATCC
Multiple Sequence Alignment Global versus Local Alignments • Progressive alignment • Estimate guide tree • Do pairwise alignment on subtrees ClustalX
Improvements • Consistency-based Algorithms • T-Coffee - consistency-based objective function to minimize potential errors • Generates pair-wise global (Clustal) • Local (Lalign) • Then combine, reweight, progressive alignment
Iterative Algorithms • Estimate draft progressive alignment (uncorrected distances) • Improved progressive (reestimate guide tree using Kimura 2-parameter) • Refinement - divide into 2 subtrees, estimate two profiles, then re-align 2 profiles • Continue refinement until convergence
Software • Clustal • T-Coffee • MUSCLE (limited models) • MAFFT (wide variety of models)
Comparisons • Speed • Muscle>MAFFT>CLUSTALW>T-COFFEE • Accuracy • MAFFT>Muscle>T-COFFEE>CLUSTALW • Lots more work to do here!