270 likes | 432 Views
Multiple Alignments Motifs/Profiles. What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile?. Prev. reading: Ch 1-5 Assigned reading: Ch 6.4, 6.5, 6.6. BIO520 Bioinformatics Jim Lund. Information from Alignments.
E N D
Multiple AlignmentsMotifs/Profiles • What is multiple alignment? • HOW does one do this? • WHY does one do this? • What do we mean by a motif or profile? Prev. reading: Ch 1-5 Assigned reading: Ch 6.4, 6.5, 6.6 BIO520 Bioinformatics Jim Lund
Information from Alignments • Infer biological function • Conserved elements critical for function • Divergent elements relate to divergent function • Infer structure (2°, 3°) • Infer phylogeny • History • Evolutionary forces (selection…)
How do I find similar sequences? DATABASE Alignment
Multiple Alignment • Global, Optimal • Theory • Computation • Progressive Alignment
Alignment Methods/Programs • GAP (GCG suite) • Optimal Alignment • MSA • (nearly) Optimal Alignment • Clustal W/X • Progressive Alignment • PSI-BLAST • Searches for matching sequences iteratively • Search seq is invariant master for the alignment.
MSA Strategy c(A)=c(Ai,j) Minimize score! • HUGE matrix(aa# of seqs)CRASH computer • time~product of sequence length • 1000x10,000 OK, but 200x200x200x200 NOT • Alignment procedure • nearly optimal--only considers a subset of all alignment) • weight sequences via distance • branch-and-bound algorithm
Running MSA • Download and run it locally (UNIX): • http://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/genetic_analysis.html • On the internet: • http://searchlauncher.bcm.tmc.edu/multi-align/multi-align.html • Rerun on segments AFTER Clustal...
Clustal Strategy • Rapid pairwise alignments each-to-each • Calculate distance matrix • Create guide tree (neighbor joining) • Align • Closest pairs first • Add pairs or align sub-alignments • Adjust similarity matrix as alignment proceeds • Add sequences • introduce gaps • gaps at loops, not inside known 2° structures • Dynamic gap weighting
Clustal Strategy Pairwise alignments Guide tree Align
Clustal W(X) Strategy1. Pairwise alignments The pairwise alignment number here is a dissimilarity measure.
Clustal W(X) Strategy4. Progressive alignment using guide tree
Running Clustal W/X • WWW, Win, Mac, UNIX • http://www2.ebi.ac.uk/clustalw/ • Input • Multiple sequence file (PIR, FASTA,…) • Can FORCE alignments • Specify secondary structures • Considerations • Fast, easy, widely used • Divergent proteins OK (trees misleading)
“The Right Proteins”GAPDH Rabbit KAENGKLVING-KAITIFQERDPANIKWGDAGAEYVVESTGVFTTMEKAGAHLKGGAKRV 117 Chick KAENGKLVING-HAITIFQERDPSNIKWADAGAEYVVESTGVFTTMEKAGAHLKGGAKRV 117 *********** :**********.:***.******************************* What do we learn?
“The Right Proteins”GAPDH Rabbit KAENGKLVING-KAITIFQERDPANIKWGDAGAEYVVESTGVFTTMEKAGAHLKGGAKRV 117 Chick KAENGKLVING-HAITIFQERDPSNIKWADAGAEYVVESTGVFTTMEKAGAHLKGGAKRV 117 Human KAEDGKLVIDG-KAITIFQERDPENIKWGDAGTAYVVESTGVFTTMEKAGAHLKGGAKRI 118 Tobacco KVKDEKTLLFGEKSVRVFGIRNPEEIPWAEAGADFVVESTGVFTDKDKAAAHLKGGAKKV 110 Entamoeba EAGENAIIVNGHKIV-VKAERDPAQIGWGALGVDYVVESTGVFTTIPKAEAHIKGGAKKV 105 :. : :: * : : :*:* :* *. *. :********* ** **:*****::
Alignment Interpretation • DNA sequences • >50% “worth looking at” (eyeball test) • ~75% needed for phylogeny • Polypeptide sequences • 80% similar=SAME tertiary structure • 30-80% domains=similar structure • 15-30% ???? • <15% short motifs
Uses of Alignment • Understanding or predicting mutant function • Finding motifs in DNA or polypeptides • Directing experiments--e.g. PCR primers • Phylogeny
“The Right Proteins” Rabbit KAENGKLVING-KAITIFQERDPANIKWGDAGAEYVVESTGVFTTMEKAGAHLKGGAKRV 117 Chick KAENGKLVING-HAITIFQERDPSNIKWADAGAEYVVESTGVFTTMEKAGAHLKGGAKRV 117 Human KAEDGKLVIDG-KAITIFQERDPENIKWGDAGTAYVVESTGVFTTMEKAGAHLKGGAKRI 118 Tobacco KVKDEKTLLFGEKSVRVFGIRNPEEIPWAEAGADFVVESTGVFTDKDKAAAHLKGGAKKV 110 Entamoeba EAGENAIIVNGHKIV-VKAERDPAQIGWGALGVDYVVESTGVFTTIPKAEAHIKGGAKKV 105 :. : :: * : : :*:* :* *. *. :********* ** **:*****:: PCR Primer Mutation tolerated
Viewing and interpreting alignments • Color residues by property • Conservation in the alignment • Known properties • Substitution groups: STA, HY • Physiochemical property • charge • hydrophobicity • Programs for visualization • Jalview • AMAS • Alscript
Viewing alignments JalView alignment viewer
How to build multiple alignments • Find sequences to align (db search). • Choose which regions of each protein to include. • Sequences should be of similar lengths. • Run multiple alignment program. • Inspect multiple alignment for problems. • Regions with many gaps have aligned poorly. • Remove disruptive sequences and re-run alignment. • Add back remaining sequences avoiding disruption.
Interpro • Pfam 7.3 (3865 domains), • PRINTS 33.0 (1650 fingerprints), • PROSITE 17.5 (1565 and 252 preliminary profiles), • ProDom 2001.3 (1346 domains), • SMART 3.1 (509 domains), • TIGRFAMs 1.2 (814 domains), • SWISS-PROT 40.27 (113470 entries), • TrEMBL 21.12 (685610 entries).
InterproA database of protein families, domains and functional sites • PROSITE, home of regular expressions and profiles; • Pfam, SMART, TIGRFAMs, PIRSF, and SUPERFAMILY keepers of hidden Markov models(HMMs); • PRINTS, provider of fingerprints (groups of aligned, un-weighted motifs);
NCBI CDD (Conserved Domain Database Domains from: • Pfam (Protein families) • A database of protein families that currently contains > 7973 entries. • SMART (a Simple Modular Architecture Research Tool) • More than 500 domain families found in signalling, extracellular and chromatin-associated proteins are detectable. • Domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues. • COGs (Clusters of Orthologous Groups) • Proteins or groups of paralogs from at least 3 lineages that correspond to an ancient conserved domain