130 likes | 256 Views
Bioinformatics Applications and Workloads. Collaboration with the BMRB. The BioMagResBank is a repository for data from NMR spectroscopy on proteins. Two main efforts: - Weekly BLAST run - Protein Structure Determination. BLAST. Framework in PERL completely automates the process:
E N D
Collaboration with the BMRB The BioMagResBank is a repository for data from NMR spectroscopy on proteins. Two main efforts: - Weekly BLAST run - Protein Structure Determination
BLAST Framework in PERL completely automates the process: - Requires no previous setup - Downloads and installs BLAST - Retrieves and formats all DBs - Retrieves input queries from URL
BLAST • Input can be in .tar, .zip, .gz, .Z files • Automatically splits input • Creates condor jobs and a .dag file • Is very fault tolerant by using DAGMan to oversee the run • When all results are complete, it packages the results and log files
BLAST • Resulting tarballs can be configured to be no larger than a certain size for more reliable transfer • After tarballs are created, they are automatically sent to an ftp server
BLAST • We’ve been doing the run every week for about a year with almost no human intervention • Very easy to add new databases or sets of input sequences!
Protein Structure • Collaboration with Jurgen Dorelijers of the BMRB and Aart Nederveen from Utrecht University in the Netherlands • Recalculated the structure of over 500 proteins using state-of-the-art techniques • Applications used were both CNS and CYANA
Protein Structure • DAGMan used to manage workflow and to provide fault-tolerance. • Using periodic_remove in the submit file to keep the job from “misbehaving” combines nicely with DAGMan’s RETRY feature.
Protein Structure • The effort used about 30000 hours of compute time • We accomplished the run in about 60 hours of real time • Framework that I created allows you to very simply compute the structure of as many proteins as you like, making it easy, automatic, and repeatable.
Protein Structure • Groups often use different parameters and protocols in structure determination and only calculate a few structures • Comparing structures from different groups is then difficult
Protein Structure • Our work was significant because it computed not just a few but over 500 structures • All were computed with the same paramaters, making the results very internally consistant (besides being more accurate on their own due to the state-of-the-art techniques)
Web Portal • Currently supports only BLAST • Being used by a handful of users from the biochem department at the UW • Interest is growing, so we’ll soon be adding more applications
Questions? Thank You!