130 likes | 250 Views
Using BLAST options to refine a search Address the question “how many of the Phytophthora/tomato interaction ESTs are tomato?” A: Will depend on conditions. E-value <1 x 10 -8 ; match length > 200 bp; identities > 95%; % match overlap > 50%: ~2100 (54%) show match with 1622 unique ESTs.
E N D
Using BLAST options to refine a search • Address the question “how many of the Phytophthora/tomato interaction ESTs are tomato?” • A: Will depend on conditions. E-value <1 x 10-8 ; match length > 200 bp; identities > 95%; % match overlap > 50%: ~2100 (54%) show match with 1622 unique ESTs. • Can the question be more easily addressed by refining BLAST search? • Other BLAST options.
$ ./blastall.exe -e Expectation value <E> [Real] default = 10.0
$ ./blastall.exe -m alignment view options: 0 = pairwise 1 = query-anchored showing identities . . . 7 = XML Blast output 8 = tabular 9 = tabular with comment lines
Run nucleotide BLAST (blastn) $ /cygdrive/c/Blast/bin/blastall -p blastn -d ./TA496Seq1.txt -i ./tomatosequence.txt –o OUTE2.txt –e 0.01 $ grep –c “Strand =“ OUTE2.txt 3 (with default this was 82…) $ /cygdrive/c/Blast/bin/blastall -p blastn -d ./TA496Seq1.txt -i ./PhytophSeq1.txt –o PhytOUTE1.txt –e 1e-8 $ grep –c “Strand =“ PhytOUTE1.txt 108,787 (with default this was 292,568…) NOTE: the blast which compares 3,921 sequences to a database of 116,711 sequences will take some time (15 minutes on my laptop).
Searching..................................................doneSearching..................................................done Score E Sequences producing significant alignments: (bits) Value gi|9292199|gb|BE354223.1|BE354223 EST355566 tomato flower buds, ... 1237 0.0 gi|16248018|gb|BI933546.1|BI933546 EST553435 tomato flower, anth... 1017 0.0 gi|4384985|gb|AI489614.1|AI489614 EST247953 tomato ovary, TAMU S... 908 0.0 >gi|9292199|gb|BE354223.1|BE354223 EST355566 tomato flower buds, anthesis, Cornell University Solanum lycopersicum cDNA clone cTOD9L3, mRNA sequence Length = 632 Score = 1237 bits (624), Expect = 0.0 Identities = 630/632 (99%) Strand = Plus / Plus Query: 1504 gactggctagaatggctgcaatcatggcatctacttacaaggcttatcttggcgtcggac 1563 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 1 gactggctagaatggctgcaatcatggcatctacttacaaggcttatcttggcgtcggac 60 Query: 1564 ttggtccactatcatttttgacgcagtatagaataccacatcctggaagagttggtggaa 1623 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 61 ttggtccactatcatttttgacgcagtatagaataccacatcctggaagagttggtggaa 120
Run nucleotide BLAST (blastn) $ /cygdrive/c/Blast/bin/blastall -p blastn -d ./TA496Seq1.txt -i ./tomatosequence.txt –o OUTE2.txt –m 8 8 = tabular format -m = alignment view options
length/mismatch querry start/end Slycopersicum.sequence gi|9292199|gb|BE354223.1|BE354223 99.68 632 2 0 1504 2135 1 632 0.0 1237 Slycopersicum.sequence gi|16248018|gb|BI933546.1|BI933546 99.62 521 2 0 1668 2188 1 521 0.0 1017 Slycopersicum.sequence gi|4384985|gb|AI489614.1|AI489614 99.57 466 2 0 1818 2283 1 466 0.0 908 identities gap openings Subject start/end e-value bit score
tblastn Running BLAST against a protein or peptide (translated BLAST vs nucleotide data) $ /cygdrive/c/Blast/bin/blastall -p tblastn -d ./TA496Seq1.txt -i ./SB7-15-13.txt –o PEPTIDEOUT.txt (–e #) Try: $ /cygdrive/c/Blast/bin/blastall -p tblastn -d ./TA496Seq1.txt -i ./SB7-15-13-Pep4A.txt –o PEPTIDEOUT.txt Then Try: $ /cygdrive/c/Blast/bin/blastall -p tblastn -d ./TA496Seq1.txt -i ./SB7-15-13-Pep4A.txt –o PEPTIDEOUT.txt –e 50
From Xiaodong Other useful BLAST options (1) “-b integer” number of database sequence to show alignments for. The default value is 250. To give it a smaller number will effectively reduce the size of the output file and make the BLAST searches faster. (2) “-v integer” number of database sequences to show one-line descriptions for. The default value is 500. A smaller number for “-v” option will have a similar effect as the “-b”. (3) “-a integer” number of processor to use. Most laptops have only one processor. But if they use BLAST program in a linux workstation with multiple processors, use all processors will drastically reduce the execution time.
From Xiaodong Other useful BLAST options (4) “-m 7” will give results in XML format, which is useful if the users will import the BLAST output results into the Blast2GO for GO assignment and metabolic pathway predictions. (5) “-l string” Restrict search of database to list of GI’s (gene index), a specific identifier for each sequence in GenBank. The string is the name of the file containing all the GI’s of the sequences of the subset you want to search against. Use this option for searches against subsets of a large database without creating multiple databases. The advantage of doing this is that the E values for all the searches against the subsets are comparable. If the subsets were individual databases, the sizes are different making E values incomparable between the searches.