1 / 19

Harnessing the Power of Condor for Human Genetics

Harnessing the Power of Condor for Human Genetics. Bret A. Payseur Laboratory of Genetics University of Wisconsin. Our research: evolutionary genetics. Analysis of DNA variation across human populations to understand: Roles of different evolutionary forces

aimee
Download Presentation

Harnessing the Power of Condor for Human Genetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Harnessing the Power of Condor for Human Genetics Bret A. Payseur Laboratory of Genetics University of Wisconsin

  2. Our research: evolutionary genetics • Analysis of DNA variation across human populations to understand: • Roles of different evolutionary forces • Prospects for finding genes that cause disease • Analysis of crosses between mouse strains to understand: • How anatomy evolves • How new species arise

  3. Our computational needs • Multi-dimensional statistical inference: we measure many different (partially correlated) features of DNA variation • Genome-scale analyses: we measure variation at thousands to millions of sites • Replicates: we conduct population simulations to measure stochastic effects

  4. Haplotype phasing Each human has two copies of each site on a chromosome (one from each parent) A T G C Site 1 Site 2

  5. Haplotype phasing We want to know which variant goes with which on the chromosome A T G C Site 1 Site 2

  6. Haplotype phasing Genotyping technology cannot distinguish between these two possibilities in individuals that vary at both sites A T T A G C G C Configuration 1 Configuration 2

  7. Solution: PHASE algorithm • Uses Markov Chain Monte Carlo (MCMC) sampling scheme • Uses coalescent simulations based on population genetic principles • Identifies haplotypes for each individual with statistical uncertainty (posterior probability) • State of the art method in human genetics

  8. Scope of problem • Goal: reconstruct phase in a human dataset of genomic proportions • Dataset is large • 720 regions of the genome • 100 variable sites per region • 3 populations • 60 individuals per population • Computational approach is intensive

  9. Scope of problem Average run time 8 hours 720 regions x 3 populations x 8 hours = 17,280 hours

  10. Scope of problem Running full time on 5 Payseur lab computers: 144 days!

  11. ENTER CONDOR

  12. Approach • Create submit file for each job – automated using perl script • Submit each job – automated using a perl script

  13. CONDOR submit file universe = standard executable = PHASE error = phase.err log = phase.log should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = phase.in transfer_output_files = phase.out Requirements = ((OpSys == "LINUX") && ((Arch == "INTEL") || (Arch == "X86_64"))) Arguments = -MR -P1 phase.in phase.out queue

  14. Running on vanilla universe • Huge increase in efficiency • Challenge • Run times often exceeded allocated CPU time • Many jobs did not finish

  15. CONDOR solution • Use condor_compile on the standard universe to allow checkpointing • Expand machine pool to include X86_64/LINUX and INTEL/LINUX nodes

  16. Result

  17. We have also used CONDOR to… • Simulate genetic mapping of complex diseases in mice (Payseur and Place 2007; Genetics) • Infer relationships among mouse strains used in biomedical research

  18. We hope to use CONDOR for… EVERYTHING

  19. Acknowledgments Miron Livny Zach Miller David Schwartz

More Related