80 likes | 205 Views
Rick Westerman Purdue Genomics westerman@purdue.edu. blastx Nucleotide to protein database Denovo Transcriptome / RNAseq 30K – 150K sequences 300 – 5000 bases ~100 MB input file E-value is 10 -6 Up to 10M hits to 'nr' ~5000 CPU-hours.
E N D
Rick Westerman Purdue Genomics westerman@purdue.edu
blastx Nucleotide to protein database Denovo Transcriptome / RNAseq 30K – 150K sequences 300 – 5000 bases ~100 MB input file E-value is 10-6 Up to 10M hits to 'nr' ~5000 CPU-hours
1) Break up input into many ~200 KB files – about 500 of them. 2) Grab up to 250 8-cpu 'standby' nodes on RCAC clusters; 4 hour maximum Note: use own queuing method (“chaining”) 3) Failures are manually caught and re-done. 4) Do above for each sample (experiment) Current method – RCAC clusters
1) Break up input into many ~40 KB files. 2) Toss all files onto Condor. Blast is setup to use 8 cpus. Only current restriction: 1 GB memory. 3) Condor retries up to 5 times. After that failures are manually caught and re-done. 4) Do above for each sample (experiment) Condor method
Use cases • Accuracy • Reliability • Speed – plant -- insect
1650 jobs …which started up 11,500+ times ... 5919 Abnormal termination (signal 1) 3667 Normal termination (return value 129) 2034 Job was evicted. 85 Abnormal termination (signal 9) 74 Normal termination (return value 0) 1 Normal termination (return value 1) Case #6 failure reasons