190 likes | 350 Views
Harnessing the Power of Condor for Human Genetics. Bret A. Payseur Laboratory of Genetics University of Wisconsin. Our research: evolutionary genetics. Analysis of DNA variation across human populations to understand: Roles of different evolutionary forces
E N D
Harnessing the Power of Condor for Human Genetics Bret A. Payseur Laboratory of Genetics University of Wisconsin
Our research: evolutionary genetics • Analysis of DNA variation across human populations to understand: • Roles of different evolutionary forces • Prospects for finding genes that cause disease • Analysis of crosses between mouse strains to understand: • How anatomy evolves • How new species arise
Our computational needs • Multi-dimensional statistical inference: we measure many different (partially correlated) features of DNA variation • Genome-scale analyses: we measure variation at thousands to millions of sites • Replicates: we conduct population simulations to measure stochastic effects
Haplotype phasing Each human has two copies of each site on a chromosome (one from each parent) A T G C Site 1 Site 2
Haplotype phasing We want to know which variant goes with which on the chromosome A T G C Site 1 Site 2
Haplotype phasing Genotyping technology cannot distinguish between these two possibilities in individuals that vary at both sites A T T A G C G C Configuration 1 Configuration 2
Solution: PHASE algorithm • Uses Markov Chain Monte Carlo (MCMC) sampling scheme • Uses coalescent simulations based on population genetic principles • Identifies haplotypes for each individual with statistical uncertainty (posterior probability) • State of the art method in human genetics
Scope of problem • Goal: reconstruct phase in a human dataset of genomic proportions • Dataset is large • 720 regions of the genome • 100 variable sites per region • 3 populations • 60 individuals per population • Computational approach is intensive
Scope of problem Average run time 8 hours 720 regions x 3 populations x 8 hours = 17,280 hours
Scope of problem Running full time on 5 Payseur lab computers: 144 days!
Approach • Create submit file for each job – automated using perl script • Submit each job – automated using a perl script
CONDOR submit file universe = standard executable = PHASE error = phase.err log = phase.log should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = phase.in transfer_output_files = phase.out Requirements = ((OpSys == "LINUX") && ((Arch == "INTEL") || (Arch == "X86_64"))) Arguments = -MR -P1 phase.in phase.out queue
Running on vanilla universe • Huge increase in efficiency • Challenge • Run times often exceeded allocated CPU time • Many jobs did not finish
CONDOR solution • Use condor_compile on the standard universe to allow checkpointing • Expand machine pool to include X86_64/LINUX and INTEL/LINUX nodes
We have also used CONDOR to… • Simulate genetic mapping of complex diseases in mice (Payseur and Place 2007; Genetics) • Infer relationships among mouse strains used in biomedical research
We hope to use CONDOR for… EVERYTHING
Acknowledgments Miron Livny Zach Miller David Schwartz