270 likes | 440 Views
Tutorial 5. Exploring Protein Sequences. Exploring Protein Sequences. Multiple alignment ClustalW Motif discovery MEME Jaspar. A. C. D. B. Multiple Sequence Alignment. More than two sequences DNA Protein Evolutionary relation Homology Phylogenetic tree Detect motif.
E N D
Tutorial 5 Exploring Protein Sequences
Exploring Protein Sequences • Multiple alignment • ClustalW • Motif discovery • MEME • Jaspar
A C D B Multiple Sequence Alignment • More than two sequences • DNA • Protein • Evolutionary relation • Homology Phylogenetic tree • Detect motif GTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC
A C D B Multiple Sequence Alignment • Dynamic Programming • Optimal alignment • Exponential in #Sequences • Progressive • Efficient • Heuristic GTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC
ClustalW “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al
ClustalW • Progressive • At each step align two existing alignments or sequences • Gaps present in older alignments remain fixed GTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TGTC-TAG-CGAGCGTGC-GAAG-AG-GCG-GCCGTCG-CG-TCGT
ClustalW - Input Scoring matrix Gap scoring Input sequences
ClustalW - Output Input sequences Pairwise alignment scores Building alignment Final score
ClustalW Output Sequence names Sequence positions Match strength in decreasing order: * : .
Can we find motifs using multiple sequence alignment? 1 3 5 7 9 ..YDEEGGDAEE.. ..YDEEGGDAEE.. ..YGEEGADYED.. ..YDEEGADYEE.. ..YNDEGDDYEE.. ..YHDEGAADEE.. * :** *: Motif A widespread pattern with a biological significance
Can we find motifs using multiple sequence alignment? YES! NO
MEME – Multiple EM for Motif finding • http://meme.sdsc.edu/ • Motif discovery from unaligned sequences • Genomic or protein sequences • Flexible model of motif presence (Motif can be absent in some sequences or appear several times in one sequence)
MEME - Input Email address Multiple input sequences Range of motif lengths How many motifs? How many times in each sequence? How many sites?
MEME - Output Like BLAST Motif length Number of times
MEME - Output Probability * 10 ‘a’=10, ‘:’=0
MEME - Output Low uncertainty = High information content
MEME - Output Multilevel Consensus
MEME - Output Position in sequence Strength of match Sequence names Reverse complement (genomic input only) Motif within sequence
MEME - Output Motif instance ‘-’=Other strand sequence lengths Overall strength of motif matches
MAST • Searches for motifs (one or more) in sequence databases: • Like BLAST but motifs for input • Similar to iterations of PSI-BLAST • Profile defines strength of match • Multiple motif matches per sequence • Combined E value for all motifs • MEME uses MAST to summarize results: • Each MEME result is accompanied by the MAST result for searching the discovered motifs on the given sequences.
JASPAR • Profiles • Transcription factor binding sites • Multicellular eukaryotes • Derived from published collections of experiments • Open data accesss
JASPAR • profiles • Modeled as matrices. • can be converted into PSSM for scanning genomic sequences.
Search profile http://jaspar.cgb.ki.se/