1 / 13

Bioinformatics Applications and Workloads

Bioinformatics Applications and Workloads. Collaboration with the BMRB. The BioMagResBank is a repository for data from NMR spectroscopy on proteins. Two main efforts: - Weekly BLAST run - Protein Structure Determination. BLAST. Framework in PERL completely automates the process:

heaton
Download Presentation

Bioinformatics Applications and Workloads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Applications and Workloads

  2. Collaboration with the BMRB The BioMagResBank is a repository for data from NMR spectroscopy on proteins. Two main efforts: - Weekly BLAST run - Protein Structure Determination

  3. BLAST Framework in PERL completely automates the process: - Requires no previous setup - Downloads and installs BLAST - Retrieves and formats all DBs - Retrieves input queries from URL

  4. BLAST • Input can be in .tar, .zip, .gz, .Z files • Automatically splits input • Creates condor jobs and a .dag file • Is very fault tolerant by using DAGMan to oversee the run • When all results are complete, it packages the results and log files

  5. BLAST • Resulting tarballs can be configured to be no larger than a certain size for more reliable transfer • After tarballs are created, they are automatically sent to an ftp server

  6. BLAST • We’ve been doing the run every week for about a year with almost no human intervention • Very easy to add new databases or sets of input sequences!

  7. Protein Structure • Collaboration with Jurgen Dorelijers of the BMRB and Aart Nederveen from Utrecht University in the Netherlands • Recalculated the structure of over 500 proteins using state-of-the-art techniques • Applications used were both CNS and CYANA

  8. Protein Structure • DAGMan used to manage workflow and to provide fault-tolerance. • Using periodic_remove in the submit file to keep the job from “misbehaving” combines nicely with DAGMan’s RETRY feature.

  9. Protein Structure • The effort used about 30000 hours of compute time • We accomplished the run in about 60 hours of real time • Framework that I created allows you to very simply compute the structure of as many proteins as you like, making it easy, automatic, and repeatable.

  10. Protein Structure • Groups often use different parameters and protocols in structure determination and only calculate a few structures • Comparing structures from different groups is then difficult

  11. Protein Structure • Our work was significant because it computed not just a few but over 500 structures • All were computed with the same paramaters, making the results very internally consistant (besides being more accurate on their own due to the state-of-the-art techniques)

  12. Web Portal • Currently supports only BLAST • Being used by a handful of users from the biochem department at the UW • Interest is growing, so we’ll soon be adding more applications

  13. Questions? Thank You!

More Related