260 likes | 382 Views
Expect value (E-value). Expected number of hits, of equivalent or better score, found by random chance in a database of the size searched. Conserved domains Domain: sequence of amino acids that typically fold to a stable tertiary structure. Many proteins are multi-domain. Blast to Psi-Blast.
E N D
Expect value(E-value) • Expected number of hits, of equivalent or better score, found by random chance in a database of the size searched.
Conserved domains Domain: sequence of amino acids that typically fold to a stable tertiary structure. Many proteins are multi-domain.
Blast to Psi-Blast • Blast makes use of Scoring Matrix derived from large number of proteins. • What if you want to find homologs based upon a specific gene product? • Develop a position specific scoring matrix (PSSM).
PSSM M F W Y G A P V I L C R K E N D Q S T H M G A S F 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 0 0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Determine frequency of substitution, and converts to LogOdd score.
PSSM INDEL M F W Y G A P V I L C R K E N D Q S T H M G A S F 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 0 0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Indel 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 Can include a score for permitting insertions and deletions. Perhaps this position is at a turn, where INDELs are common.
PSSM • In evaluating (scoring) alignments, PSSM approaches typically: • Reward matches to columns that have conserved amino acids • Penalize mismatches to columns with conserved amino acid more than mismatches in a variable column
PSI-BLAST • Input a single query sequence. • Executes a BLAST run. • Program takes significant hits, incorporates matches into a PSSM. • Sequences >98% similar not included (avoid biasing the PSSM).
Power of approach: • PSI-BLAST is iterative. • Takes best hits and improves the scoring matrix.
The PSSM will skew towards this region