80 likes | 186 Views
Heuristic PSA. “Words” to describe dot-matrix analysis Approaches FASTA BLAST Searching databases for sequence similarities PSA Alternative strategies Iterative searching Reverse searching. “Words” for Dot-matrix analysis. Useful ideas from DM Alignment Diagonal represents local match
E N D
Heuristic PSA • “Words” to describe dot-matrix analysis • Approaches • FASTA • BLAST • Searching databases for sequence similarities • PSA • Alternative strategies • Iterative searching • Reverse searching Lecture 7 CS566
“Words” for Dot-matrix analysis • Useful ideas from DM Alignment • Diagonal represents local match • Broken diagonal = intervening mismatch • Displaced diagonals = Matches with gaps • Advantage of using word-based alignment • Faster algorithm • Word-list comparison faster than sequence comparison • Hashes used for rapid comparison of words • “Devil is in the details” Lecture 7 CS566
FASTA (Fast-All) • Motivation: Needed rapid PSA method to search databases for matches to query sequence (1:n comparisons) • ktup (k-tuple or word) based alignment • Create hash tables for sequences • Find matching ktups (“hot-spots”/short diagonals) in pair of sequences • ktup size = 2 for protein (6 for DNA) Lecture 7 CS566
FASTA • Find 10 best “diagonal-runs” • Group hot-spots by the (i-j) diagonal they lie in • Main diagonal numbered 0; • Positive diagonals lie above main diagonal, negative lie below • Diagonal-run = set of consecutive (not necessarily contiguous) hot-spots, penalized by size of intervening mismatch • Save top 10 diagonal runs Lecture 7 CS566
FASTA • Find init1 • Init1 = best contiguous subsequence from top 10 diagonal runs, based on AAS (default BLOSUM50) • Define local search space around init1 • Include (32 / ktup) +/- diagonals in search space • For ktup = 2, 16 diagonals around init1 • Perform Smith-Waterman PSA in reduced space • Report resulting alignment as opt Lecture 7 CS566
BLAST (Basic local alignment search tool) • Built upon ideas derived from FASTA, with incorporation of new elements • For every word in query, generate set of words • Use AAS for similarity score between query word and all possible words of same size • Include all words exceeding cut-off in set • Example: For word DED, and threshold 0, word set includes DED, DDD, EEE, EDE etc. • For every query word, generate hot-spots based on set of similar words • Then merge contiguous words along same diagonal (a la FASTA) to form High Scoring Pairs (HSPs) Lecture 7 CS566
FASTA versus BLAST • Word matching exact in FASTA but inexact (AAS-based) in BLAST • Larger word size in BLAST • FASTA more sensitive (Why?) but slower (Why?) • BLAST handles “low-complexity” inline • Programs DUST and/or SEG used for filtering sequences Lecture 7 CS566
Variations on BLAST-based searching • Mapping query to different alphabets • Protein versus DNA, • DNA versus protein (Multiple reading frames) • PSI-BLAST: Position-specific iterative BLAST • Use query to find hits • Assemble hits into on-the-fly Position-specific-scoring matrix (PSSM) • RPS-BLAST: Reverse position-specific BLAST • Query is search space • Database of PSSMs used to search for match Lecture 7 CS566