1.48k likes | 1.73k Views
The Basic Local Alignment Search Tool (BLAST). Rapid data base search tool (1990) Idea: (1) Search for high scoring segment pairs. The Basic Local Alignment Search Tool (BLAST). A Y W T Y I V A L T – Q V R Q Y E A T S I L C I V M I Y S R A - Q Y R Y W R Y
E N D
The Basic LocalAlignment Search Tool(BLAST) Rapid data base search tool (1990) Idea: (1) Search for high scoring segment pairs
The Basic LocalAlignment Search Tool(BLAST) A Y W T Y I V A L T – Q V R Q Y E A T S I L C I V M I Y S R A - Q Y R Y W R Y Most local alignments contain highly conserved sections without gaps
The Basic LocalAlignment Search Tool(BLAST) A Y W T Y I V A L T – Q V R Q Y E A T S I L C I V M I Y S R A - Q Y R Y W R Y -> search for high scoring segment pairs (HSP), i.e. gap-free local alignments
The Basic LocalAlignment Search Tool(BLAST) A Y W T Y I V A L T – Q V R Q Y E A T S I L C I V M I Y S R A - Q Y R Y W R Y Advantages: (a) speed (b) statistical theory about HSP exists.
The Basic LocalAlignment Search Tool(BLAST) Rapid data base search tool (1990) Idea: (1) Search for high scoring segment pairs (2) Use word pairs as seeds
Pair-wise sequence alignment T W L M H C A Q Y I C I M X H X C X T H Y (1) Search word pairs of length 3 with score > T, Use them as seeds.
Pair-wise sequence alignment Naïve algorithm would have a complexity of O(l1 * l2) Solution: Preprocess query sequence: • Compile a list of all words that have a Score > T when aligned to a word in the Query.
Pair-wise sequence alignment Naïve algorithm would have a complexity of O(l1 * l2) Solution: Preprocess query sequence: • Compile a list of all words that have a Score > T when aligned to a word in the Query. Complexity: O(l1) • Organize words in efficient data structure (tree) for fast look-up
The Basic LocalAlignment Search Tool(BLAST) Rapid data base search tool (1990) Idea: (1) Search for high scoring segment pairs (2) Use word pairs as seeds (3) Extend seed alignments until score drops below threshold value
Pair-wise sequence alignment T W L M H C A Q Y I C I M X H X C X T H Y Extend seeds until score drops by X.
Pair-wise sequence alignment T W L M H C A Q Y I C I X M X H X C X T X H X Y Extend seeds until score drops by X.
Pair-wise sequence alignment Algorithm not guaranteed to find best segment pair (Heuristic) But works well in practice!
The Basic LocalAlignment Search Tool(BLAST) New BLAST version (1997) • Two-hit strategy
Pair-wise sequence alignment W L M H C A Q Y A R V I M X H X C X T H W AX R X v X Search twoword pairs of at the same diagonal, use lowerthreshold T
The Basic LocalAlignment Search Tool(BLAST) New BLAST version (1997) • Two-hit strategy • Gapped BLAST • Position-Specific Iterative BLAST (PSI BLAST)
1aboA 1 .NLFVALYDfvasgdntlsitkGEKLRVLgynhn..............gE 1ycsB 1 kGVIYALWDyepqnddelpmkeGDCMTIIhrede............deiE 1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgslvalgfsdgqearpeeiG 1ihvA 1 .NFRVYYRDsrd......pvwkGPAKLLWkg.................eG 1vie 1 .drvrkksga.........awqGQIVGWYctnlt.............peG 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN...... Multiple sequence alignment
Multiple sequence alignment First question: how to score multiple alignments? Possible scoring scheme: Sum-of-pairs score
Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......
Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......
Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......
Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQtkngqGWVPSNYITPVN 1ycsB 39 WWWARlndkeGYVPRNLLGLYP
Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......
Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......
Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp
Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp
Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......
Multiple sequence alignment Multiple alignment implies pairwise alignments: Use sum of scores of these p.a. 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......
Multiple sequence alignment Goal: Find multi-alignment with maximum score !
Multiple sequence alignment • Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment • Multidimensional search space instead of two-dimensional matrix!
Multiple sequence alignment Complexity: For sequences of length l1 * l2 * l3 O( l1 * l2 * l3 ) For n sequences ( average length l ): O( ln ) Exponential complexity!
Multiple sequence alignment • Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment • Optimal solution not feasible:
Multiple sequence alignment • Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment • Optimal solution not feasible: • -> Heuristics necessary
Multiple sequence alignment (A) Carillo and Lipman (MSA) Find sub-space in dynamic-programming Matrix where optimal path can be found
Multiple sequence alignment (B) Stoye, Dress (DCA) • Divide search space into small • Calculate optimal alignment for sub-spaces • Concatenate sub-alignments
Multiple sequence alignment (B) Stoye, Dress (DCA)
Multiple sequence alignment (B) Stoye, Dress (DCA)
Multiple sequence alignment Progressive alignment. Carry out a series of pair-wise alignment
Multiple sequence alignment Most popular way of constructing multiple alignments: Progressive alignment. Carry out a series of pair-wise alignment
Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP
Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Align most similar sequences
Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASFQPVAALERIN WLNYNEERGDFPGTYVEYIGRKKISP
Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP
Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Align sequence to alignment
Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN- WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Align alignment to alignment
Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP
Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP Rule: “once a gap - always a gap”
Multiple sequence alignment Order of pair-wise profile alignments determined by phylogenetic tree based on pair-wise similarity values (guide tree)