160 likes | 276 Views
Integrated Bioinformatics Wednesday, 6 October 2004. Solving the mystery of BlastN Download BlastN.pl Use local BlastN on the lef and PG47 sequences Compare this result to using the NCBI pairwise blast. Scoring Sequence Alignments Calculating E. E = m · n · p S.
E N D
Integrated BioinformaticsWednesday, 6 October 2004 • Solving the mystery of BlastN • Download BlastN.pl • Use local BlastN on the lef and PG47 sequences • Compare this result to using the NCBI pairwise blast
Scoring Sequence AlignmentsCalculating E E = m · n · pS Expected number = number of possibilities · unit probability 1/32 Example: Expected number of a match of H H H H T ? Unit probability = ½ · ½ · ½ · ½ · ½
Scoring Sequence AlignmentsCalculating E E = m · n · pS Expected number = number of possibilities · unit probability 5/32 5 1/32 Example: Expected number of a match of H H H H T ? Number of possibilities = H H H H TH H H T HH H T H HH T H H HT H H H H
(match can begin anywhere in query) (match can begin anywhere in target) Scoring Sequence AlignmentsCalculating E E = m · n · pS Expected number = number of possibilities · unit probability Unit probability of match = pS=(¼) number of matches m · n Number of possibilities =
Scoring Sequence AlignmentsCalculating E E = m · n · pS Expected number = number of possibilities · unit probability Unit probability of match = pS=(¼) number of matches e ln(¼) · number of matches e -λ · number of matches
Scoring Sequence AlignmentsCalculating E E = m · n · pS Expected number = number of possibilities · unit probability
E = K · m · n · e –λS bits Scoring Sequence AlignmentsCalculating E E = m · n · pS E = m · n · 2–S’
Scoring Sequence AlignmentsCalculating E E = K · m · n · e –λS E = m · n · 2–S’ SQ5. Calculate E from parameters of real Blast search
Protein AlignmentsPAM scoring tables SQ7. Amongst protein pairs that are 99% similar to each other, what fraction of arginines in one protein correspond to lysines in the other (at the equivalent position)? What fraction of arginines in one correspond to leucines in the other
Protein AlignmentsPAM scoring tables SQ7. Amongst protein pairs that are 99% similar to each other, …what fraction of arginines in one protein correspond to lysines in the other?
Protein AlignmentsPAM scoring tables SQ8. What PAM table would be appropriate to search for proteins about 50% identical to a query sequence?
Protein AlignmentsLog odds scoring tables B L O S U M 6 2 SQ10. What sequences would be found by VLI using a T value of 13?
Print_score {1,20,1,20); . . . print_score { my ($first_target,$last_target,$first_query,$last_query) = @_; foreach $t ( CONTINUE THIS LINE foreach $q ( CONTINUE THIS LINEif (defined( CONTINUE THIS LINE ) { printf "%6d", CONTINUE THIS LINE } } print CONTINUE THIS LINE }} BlastN: Local versionDoes it work? SQ4. Complete the subroutineprint_score
Scenario 2: Genome comparison & Parsing 2P.2. … It's often useful to know the size of an array. One way to do this… my @a = ("red", "green", "blue"); my $size = @a print $size, "\n";
BlastN: Web version Checklist 1. Filter the query sequence to remove repetitive regions X 2. Find all query-target matches a. Extract a word from the query, using a sliding window √ b. Find an exact match of the word in the target sequence If no match, return to Step a √ c. Extend match in both directions √ X d. Calculate a score for the final match X e. Save matches whose scores exceed threshold f. Repeat a - e √ X 3. Rank the matches by their scores 4. Print out the top matches. ~