Prof Dan Geiger, CBL,Technion Prof Assaf Schuster, DSL,Technion

Superlink-Online: Disease Gene Hunting Using Parallel Computing Over Hierarchy of Grids Prof Dan Geiger, CBL,Technion Prof Assaf Schuster, DSL,Technion Genetics Research Institutes in Israel, EU, US Prof Miron Livny, Condor, Madison MSR HPC - Superlink Online

Purpose of disease gene hunting • Why Search ? • Detection of diseases before birth • Risk assessment and corresponding life style changes • Finding the mutant proteins and developing medicine • Understanding basic biological functions • How to Search ? • Find families segregating the disease (linkage analysis) or collect unrelated healthy and affected persons (Association analysis or LD mapping) • Take a simple blood test from some individuals • Analyze the DNA in the lab • Compute the most likely location of disease gene MSR HPC - Superlink Online

Usage of our system in Israeli Hospitals • Rabin Hospital, by Motti Shochat’s group • New locus for mental retardation (2003) • Infantile bilateral striatal necrosis (2004) • Soroka Hospital, by Ohad Birk’s group • Lethal congenital contractural syndrome (2004) • Congenital cataract (2005) • Rambam Hospital, by Eli Shprecher’s group • Congenital recessive ichthyosis (2005) • CEDNIK syndrome (2005) • Galil Ma’aravi Hospital, by Tzipi Falik’s group • Familial Onychodysplasia and dysplasia • Familial juvenile hypertrophy (2005) MSR HPC - Superlink Online

Identifygenes(104~105 bp) Resequencing(100 bp) Steps in Gene Hunting Linkageanalysis(106~107 bp) MSR HPC - Superlink Online

Male or female Recombinantgametes Recombination During Meiosis MSR HPC - Superlink Online

Family Pedigree MSR HPC - Superlink Online

Familial Onychodysplasia and dysplasia of distal phalanges (ODP) III-15 IV-10 IV-7 MSR HPC - Superlink Online

. M1 M2 Chromosome pair: Marker Information Added MSR HPC - Superlink Online

M1 M2 D1 D2 M3 M4 θ III-15 151,159 III-16 151,155 a h 202,209 202,202 139,141 139,146 1,2 3,3 Maximum Likelihood Evaluation The computational problem: find a value of θ maximizing Pr(data|θ) LOD score (to quantify how confident we are): Z(θ)=log10[Pr(data|θ) / Pr(data|θ=½)]. MSR HPC - Superlink Online

Results of Multipoint Analysis MSR HPC - Superlink Online

The Bayesian network model Si3f Li2f y2 Xi2 Li2m Li3f Xi3 Li3m Y3 Li1f Xi1 Y1 Li1m Si3m Locus 2 (Disease) Locus 3 Locus 4 Locus 1 This model depicts the qualitative relations between the variables. We need also to specify the joint distribution over these variables. MSR HPC - Superlink Online

Finding the best order is equivalent to finding the best order for sum-product operations for high dimensional matrices: The Computational Task • Computing Pr(data|θ) for a specific value of θ: • Exponential time and space in: • #variables • five per person • #markers • #gene loci • #values per variable • #alleles • non-typed persons • table dimensionality • cycles in pedigree MSR HPC - Superlink Online

Automatic Parallelization – Variable Conditioning Parallelization overhead – non trivial MSR HPC - Superlink Online

Basic unit of execution – batch job Non-interactive mode: “enqueue – wait – execute – return” Self-contained execution sandbox Weak (lack of) Quality of Service Random failures of execution machines Hardware bugs may lead to incorrect results Potentially unbounded execution/queue waiting time Dynamic/abrupt changes of resource availability High network delays (communication over WAN) Using grids/resource pools MSR HPC - Superlink Online

The system must be geneticists-friendly On-line access through simple WEB interface Interactive experience Low response time for short tasks Fast computation of previously infeasible long tasks via parallel execution Prompt user feedback Secure, reliable, stable, overload-resistant Allow submission of tasks by multiple users Requirements MSR HPC - Superlink Online

2-4 CPUs Dedicated server 1000s CPUs 100000s CPUs Clusters of Workstations 1000000s CPUs Computational Grids (EGEE-II, OSG) Community Grids (SETI@HOME, FOLDING@HOME) Execution platforms • Tradeoff: •  Larger grids provide more resources, but •  Execution overhead grows with the grid size: • more complex resource management • higher resource volatility • higher network overheads • higher security overheads • rigid user policies Complicated scheduling software stack (EGEE-II example) Submit ► Local UI ► Resource Broker ► Local cluster ► Execute Scheduling delay on EGEE-II: *** Between mins to days *** MSR HPC - Superlink Online

Task length distribution Task length unknown upon submission From seconds to milleniums Estimating task length? NP hard <3minuts <2hours <2days <2weeks <3months >3months MSR HPC - Superlink Online

Problem: delays of short tasks Try 1: Single FIFO queue 2x2+1=? 2x2+1=? 2x2+1=? MSR HPC - Superlink Online

Problem: task runtime not known Try 2: Priority queues 2 2x2+1=? 2x2+1=? 2x2+1=? 2 2x2+1=? MSR HPC - Superlink Online

Problem: short tasks may be scheduled on high-latency resources Try 3: Multi-Level Feedback Queue 2x2+1=? 2 2x2+1=? 2x2+1=? 2 2x2+1=? MSR HPC - Superlink Online

The Grid Execution Hierarchy [HPDC’06] Submission server Waterfall principle: If task seems too hard • can take more overhead • move to a lower level • Need: • Task-length estimation • Divisible tasks (variable conditioning ) • Migration to larger pools Dedicated server Clusters of Workstations Computational Grids (EGEE-II) Community Grids (Superlink@HOME) MSR HPC - Superlink Online

Task complexity easy to derive from variable elimination order Same goes for space requirments Except for parallelisation overhead Best elimination order is hard The longer you work on it the better it gets Can be trivially parallelized Upon entering a new pool Refine bounds by improving order Until reordering time reaches a certain fraction of expected runtime, or of pool time limit heuristics If (expected runtime + current load > pool time limit) move to a larger pool Divide tasks. Work. If pool time limit expired (runtime is longer than expected) move to a larger pool Task Length Estimation MSR HPC - Superlink Online

Input validation Reporting logic Web server DB Submission logic FS FS Task Queue Handler Notification Handler Accept logic System Health Monitor Queue 1 Task execution logic (per task) Migration handler Batch system queue Results Single Pool System Health Monitor Network via SSH

Input validation Reporting logic Web server System Health Monitor DB Submission logic FS FS FS FS FS Notification Handler Notification Handler Notification Handler System Health Monitor System Health Monitor System Health Monitor Queue i Queue i Queue i Batch system queue Batch system queue Batch system queue Four Pools Output Input Pool 4 Output Pool 1 Output Task migration Output Task migration Pool 2 Pool 3 Task migration Notification Handler System Health Monitor Queue i Batch system queue

Task execution logic (simplified) First complexity estimation Reject Simple task • Condor DAGman manages the execution flow • Automatic restart after failure • Migration at any step of the DAG • Checkpoint and stop DAGman • Archive task data together with DAGman state • Move to another pool • Restart there Done Second complexity estimation Reject Retry if failed J1 J2 Jn Parallelization Reject Retry if failed J2 J1 Jk Reject Done

Web Server Current deployment Flock of two Technion Condor pools ~300CPUs Dedicated server SUPERLINK@HOME Q2 Q1 Q3 Q4 Q5 Reject Reject EGEE-II ~15,000 CPUs Flock of three UW Condor pools ~3500 CPUs MSR HPC - Superlink Online

Superlink-online portal MSR HPC - Superlink Online

Task Submission MSR HPC - Superlink Online

User submits her data for analysis No specification of running time or parallelization Secure web interface Monitoring partial results/running time expectations Cancellation E-mail notifications on important events Completion/Error/System failure Behind the scene Task runtime estimation and parallelization System monitoring, failure recovery Scheduling Superlink-online Experience MSR HPC - Superlink Online

~110 CPUyears for ~5600 tasks. Several mutated genes found Israeli and international users Soroka H., Be'er Sheva, Galil Ma'aravi H., Nahariya, Rabin H., Petah Tikva, Rambam H., Haifa, Beney Tzion H., Haifa, Sha'arey Tzedek H., Jerusalem, Hadassa H., Jerusalem, Afula H. NIH, Universities and research centers in US, France, Germany, UK, Italy, Austria, Spain, Taiwan, Australia, and others... Task complexity 250 days on single CPU -> 7 hours on ~300-700 CPUS Short tasks: few seconds even during severe overload Statistics 2006 MSR HPC - Superlink Online

Since January 1st. > 64 different user institutes > 3000 tasks > 60 cpu years Many rejects! Statistics 2007 MSR HPC - Superlink Online

Performance on real datasets MSR HPC - Superlink Online

Accumulated running time in pools (L3) (L2) (L2) (L1) MSR HPC - Superlink Online Tq=180 Tq=9600 Tq=172800

Tasks handled by each level vs. CPU consumption by each pool MSR HPC - Superlink Online

“Pool spilling” MSR HPC - Superlink Online

Pool Occupancy (~400 Tasks) MSR HPC - Superlink Online

Run times of Superlink bioinfo.cs.technion.ac.il/superlink-online MSR HPC - Superlink Online

Must leave upper pools empty to allow fast response for short tasks Long tasks cannot use empty upper pools Granularity tradeoff for automatic parallelization Short jobs do not loose much in case of kill/failure up to 30% in EGEE Scheduling overhead motivates long jobs Issues with current system MSR HPC - Superlink Online

Send thin clients as “place holders” to remote resources Similar to Condor glide-ins Build self-owned, dedicated “sub-grid” Perform application-specific scheduling on “captured” resources Scheduling overhead from hours down to seconds Dynamic/adaptive job granularity 500 cpu years in 2 months on EGEE Use BOINC Single user, flat, master-worker infrastructure Scalable Berkeley Open Infrastructure for Network Computing Creating Dedicated Grids(in testing) MSR HPC - Superlink Online

QUESTIONS??? Visit us at: http://bioinfo.cs.technion.ac.il/superlink-online MSR HPC - Superlink Online

Prof Dan Geiger, CBL,Technion Prof Assaf Schuster, DSL,Technion