90 likes | 188 Views
praktisch BLASTen & BLAST-Outputs. ATGCTG TGGCAG CGTGCA GTCCAG TCTCGT ACTGCAT. Ein praktisches Beispiel. 2.869.704 annotierte Proteine. 1.506 kartierte Gersten-Gene. BlastX. Ergebnis: 905 Annotation Laufzeit: 17,5 h. Lösung: Verteilung der Analysen. IPK Cluster BROCKEN.
E N D
ATGCTG TGGCAG CGTGCA GTCCAG TCTCGT ACTGCAT Ein praktisches Beispiel 2.869.704annotierteProteine 1.506 kartierteGersten-Gene BlastX Ergebnis: 905 Annotation Laufzeit: 17,5 h
IPK Cluster BROCKEN Ergebnis: 905 Annotation 72 Nodes -> Laufzeit: 16 min
CEF GUI CEF SOAP Web Services file server /data/pdw-20/ file server /data/pdw-16/ • Metadata about • Tools (NCBI BLAST, Spidey, …) • Tool parameters (-i FASTA-query, …) • Files (FASTA, blastable, …) • Jobs/sub jobs (progress, finished, …) master/head node pdw-22 … 22 nodes CEF: Cluster Execution Framework #!/bin/bash projdir=/data/pdw-16/agbi/projects/ #split query file python2.3 /data/pdw-20/python_scripts/splitFas2.py -i Clones.fasta -o $projdir -n 500 blast_db=$projdir/wheat_consensus.txt mergescript=$projdir/domerge.sh echo "#!/bin/sh" > $mergescript echo "cat \\" >> $mergescript z=0 for i in split/* do script_file=$projdir/script/blastjob_$$_$z.sh result_file=$projdir/result/blastresult_$$_$z.txt log_file=$projdir/log/joblog_$$_$z echo "#!/bin/sh" > $script_file #echo "cd $projdir" >> $script_file echo "/usr/bin/blastall -i $projdir/$i -p blastn -d $blast_db -m0 -e 1E-10 -v 10 -b 10 -o $result_file" >> $script_file echo "$result_file \\" >> $mergescript qsub -o $log_file.out -e $log_file.err -q long $script_file echo "qsub -o $log_file.out -e $log_file.err -q long $script_file" z=`expr $z + 1` done echo ">final_result.txt" >> $mergescript echo "rm log/* script/* " >> $mergescript
Eingabe EST-Sequenz >HY01A03T GAATTCGGCACCAGAGTGAGCACGCAAGCCAGTGTTTGTAGCCAGCAGCCACAATGGCCGGGAACATGCT AGCCAACTATGTCCAAGTCTACGTCATGCTCCCGCTGGATGTCGTGAGCGTCGACAACAAGTTCGAGAAG GGCGACGAGATCAGGGCGCAGCTGAAGAAGCTGACGGAGGCTGGCGTGGACGGCGTCATGATAGACGTCT GGTGGGGGCTGGTGGAGGGCAAGGGCCCCAAGGCCTACGACTGGAGCGCCTACAAGCAGGTCTTCGACCT GGTGCACGAGGCCAGGCTCAAGCTGCAGGCCATCATGTCGTTCCACCAGTGCGGTGGCAACGTCGGCGAC GTAGTCAACATCCCCATCCCACAGTGGGTGCGGGATGTCGGCGCTACCGACCCCGACATTTTCTACACGA ACCGCAGAGGGACGAGGAACATCGAGTACCTCACCCTTGGAGTGGATGACCAACCTCTCTTCCATGGAAG AACTGCCGTCCAGATGTATCATGATTACATGGCGAGCTTCAGGGAAAACATGAAAAAGTTCTTGGATGCC GGTACCATCGTGGACATTGAAGTGGGACTTGGCCCGGCTGGAGAGATGAGGTACCCATCCTATCCTCAGA GCCAGGGATGGGTCTTCCCAGGCATCGGAGAATTCATCTGCTATGATAAGTACCTGGAAGCAGACTTCAA
>HY01A03T Length = 700 Plus Strand HSPs: Score = 2595 (395.4 bits), Expect = 3.0e-112, P = 3.0e-112 Identities = 573/618 (92%), Positives = 573/618 (92%), Strand = Plus / Plus Query: 77 CTATGTCCAAGTCTACGTCATGCTCCCGCTGGATGTCGTGAGC--GT-CGACAACAAGTT 133 ||| ||| | | || | | | | || || |||| | | || ||| || Sbjct: 89 CTACGTC-ATG-CTCCCGCTGGATGTCG-TGAGCGTCGACAACAAGTTCGAGAAGGGCGA 145 Query: 134 CGAGA--AGGGCGACGAGATCAGGAAGCTGACGGAGGCTGGCGTGGACGGCGTCATGATA 191 ||||| |||||| | || | | ||||||||||||||||||||||||||||||||||||| Sbjct: 146 CGAGATCAGGGCG-C-AGCTGAAGAAGCTGACGGAGGCTGGCGTGGACGGCGTCATGATA 203 Query: 192 GACGTCTGGTGGGGGCTGGTGGAGGGCAAGGGCCCCAAGGCCTACGACTGGAGCGCCTAC 251 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 204 GACGTCTGGTGGGGGCTGGTGGAGGGCAAGGGCCCCAAGGCCTACGACTGGAGCGCCTAC 263 Query: 252 AAGCAGGTCTTCGACCTGGTACACGAGGCCAGGCTCAAGCTGCAGGCCATCATGTCGTTC 311 |||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||| Sbjct: 264 AAGCAGGTCTTCGACCTGGTGCACGAGGCCAGGCTCAAGCTGCAGGCCATCATGTCGTTC 323 Query: 312 CACCCCGTGCGGTGGCAACGTCGGCGACGTAGTCAACATCCCCATCCCACAGTGGGTGCG 371 |||| |||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 324 CACCA-GTGCGGTGGCAACGTCGGCGACGTAGTCAACATCCCCATCCCACAGTGGGTGCG 382 Query: 372 GGATGTCGGCGCTACCGACCCCGACATTTTCCACACGAACCTCAGAGGGACGAGGAACAT 431 ||||||||||||||||||||||||||||||| ||||||||| |||||||||||||||||| Sbjct: 383 GGATGTCGGCGCTACCGACCCCGACATTTTCTACACGAACCGCAGAGGGACGAGGAACAT 442 Query: 432 CGAGTACCTCACCCTTGGAGTGGATGACCAACCTCTCTTCCATGGAAGAACTGCCGTCCA 491 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 443 CGAGTACCTCACCCTTGGAGTGGATGACCAACCTCTCTTCCATGGAAGAACTGCCGTCCA 502 Query: 492 GATGTATCATGATTACATGGCGAGCTTCAGGGAAAACATGAAAAAGTTCTTGGATGCCGG 551 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 503 GATGTATCATGATTACATGGCGAGCTTCAGGGAAAACATGAAAAAGTTCTTGGATGCCGG 562 Query: 552 TACCATCGTGGACA---A-GTGGGACTTGGCCCGGCTGGAGAGATGAGGTACCCATCCTA 607 |||||||||||||| | ||||||||||||||||||||||||||||||||||||||||| Sbjct: 563 TACCATCGTGGACATTGAAGTGGGACTTGGCCCGGCTGGAGAGATGAGGTACCCATCCTA 622 Query: 608 TCCTCAGAGCCAGGGATGGGTCTTCCCAGGCATCGGAGAATTCATCTGCTATGATAAGTA 667 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 623 TCCTCAGAGCCAGGGATGGGTCTTCCCAGGCATCGGAGAATTCATCTGCTATGATAAGTA 682 Query: 668 CCTGGAAGCAGACTTCAA 685 |||||||||||||||||| Sbjct: 683 CCTGGAAGCAGACTTCAA 700 BlastN-Resultat
BlastX-Resultat >dbj|BAC83773.1| Gene info putative beta-amylase [Oryza sativa (japonica cultivar-group)] gb|EAZ40178.1| hypothetical protein OsJ_023661 [Oryza sativa (japonica cultivar-group)] Length=488 Score = 403 bits (1036), Expect = 4e-111 Identities = 191/215 (88%), Positives = 200/215 (93%), Gaps = 0/215 (0%) Frame = +3 Query 54 MAGNMLANYVQVYVMLPLDVVSVDNKFEKGDEIRAQLKKLTEAGVDGVMIDVWWGLVEGK 233 MAGN+LANYVQV VMLPLDVV+VDNKFEK DE RAQLKKLTEAGVDGVM+DVWWGLVEGK Sbjct 1 MAGNLLANYVQVNVMLPLDVVTVDNKFEKVDETRAQLKKLTEAGVDGVMVDVWWGLVEGK 60 Query 234 GPKAYDWSAYKQVFDLVHEARLKLQAIMSFHQCGGNVGDVVNIPIPQWVRDVGATDPDIF 413 GP +YDW AYKQ+F LV EA LKLQAIMSFHQCGGNVGD+VNIPIPQWVRDVGA+DPDIF Sbjct 61 GPGSYDWEAYKQLFRLVQEAGLKLQAIMSFHQCGGNVGDIVNIPIPQWVRDVGASDPDIF 120 Query 414 YTNRRGTRNIEYLTLGVDDQPLFHGRTAVQMYHDYMASFRENMKKFLDAGTIVDIEVGLG 593 YTNR G RNIEYLTLGVDDQPLFHGRTA+QMY DYM SFRENM +FLD G IVDIEVGLG Sbjct 121 YTNRGGARNIEYLTLGVDDQPLFHGRTAIQMYADYMKSFRENMAEFLDTGVIVDIEVGLG 180 Query 594 PAGEMRYPSYPQSQGWVFPGIGEFICYDKYLEADF 698 PAGEMRYPSYPQSQGWVFPGIGEFICYDKYLEADF Sbjct 181 PAGEMRYPSYPQSQGWVFPGIGEFICYDKYLEADF 215