160 likes | 353 Views
Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed Computing. Qing Lu CMSC 838 Presentation. Overview. Overview of talk Motivation Challenge Methods Ensemble Dynamics Folding@Home Evaluation Observations. Motivation.
E N D
Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed Computing Qing Lu CMSC 838 Presentation
Overview • Overview of talk • Motivation • Challenge • Methods • Ensemble Dynamics • Folding@Home • Evaluation • Observations CMSC 838T – Presentation
Motivation • Atomistic simulation of protein folding • understand dynamics of folding • real-time folding in full atomic detail • large-scale parallelization methods • Benefits • protein folding & disease • protein self-assemble to function • proteins misfold diseases • nanotechnology • nanomachines • self-assemble on the nanoscale CMSC 838T – Presentation
Challenge • Difficulties • limited by current computational techniques • fastest folding in microseconds • one CPU: 1ns/day, 30 years • 10,000 fold computational gap • 1,000 CPUs, 1 microsecond / day • traditional parallelization scheme • hard to scale to a large amount of processors • extremely fast communication • complexity of coordination • expensive supercomputers • cost • time-sharing CMSC 838T – Presentation
Method • ensemble dynamics • a new simulation algorithm • parallel simulation • Folding@Home • heterogeneous network, Internet • large-scale distributed platform CMSC 838T – Presentation
Simulation of Dynamics • free energy barrier • progress from one state to another: transition • thermal fluctuations to push system over free energy barrier • previous approaches: sampling • maybe stuck in meta-stable free energy minima • expensive computational cost of sampling CMSC 838T – Presentation
Ensemble Dynamics • application scenario • waiting time of transitions dominates total time • protein folding • transition: free energy barrier crossing • coupled simulations: transition coupling • Algorithm • M independent simulations from a initial condition • first simulation to cross free energy barrier • M times less to cross barrier than average time • restart M simulations with the new location after transition • Near linear speed up in #processors • exponential kinetics: f(t) = 1 – exp(-k t) • If k * t is small, f(t) = k * t • M simulations M * f(t) = M * k * t folding events CMSC 838T – Presentation
Limitations • barrier crossing probability • exponential assumptions • correct transition detection • transition: free energy barrier crossing • a large variance in energy: threshold • correct detection is not guaranteed • multiple possible transition • not addressed • selection of the first transition CMSC 838T – Presentation
Distributed Computing • Distributed simulations • M processors for each run • simulate folding in atomic detail on each processor • restart once a crossing barrier event occurs • Implementation: Folding@Home • worldwide distributed computing: Internet • started in October 2000 • more than 200,000 participants • 10,000 CPU-years in the first 12 months CMSC 838T – Presentation
Folding@Home CMSC 838T – Presentation
Folding@Home • client-server architecture • server assign jobs(work unit) to client • client sends back results after computation • ~100K data transfer between client and server • why is ensemble dynamics good for Folding@Home? • CPU intensive job: a few hours, often days • connection speed: modem, good enough • suitable for Folding@Home CMSC 838T – Presentation
Other@Home Work • SETI@Home • search for intelligent life outside Earth • data analysis of signals • FightAids@Home • find drug therapy for HIV • how drugs interact with various HIV virus mutations • distributed projects • Divide-and-Conquer • CPU intensive jobs • small pieces of data(kilobytes) transfer • communication not a major concern CMSC 838T – Presentation
Evaluation • Folding@Home • based on Tinker molecular dynamics code • voluntary participants worldwide, over 400,000 CPUs • simulate folding and unfolding • folding rates • simulations on small proteins CMSC 838T – Presentation
Folding Rates CMSC 838T – Presentation
Folding & Unfolding CMSC 838T – Presentation
Observations • Sampling • too expensive to run for a long timescales • waste too much time lingering in local energy minima • Ensemble dynamics • speed up simulations of dynamics • biological meaning of simulations results? • results on large protein folding? • limitations: correct transition detection, transition probability • Folding@Home • cheap way to achieve super computation power • huge distributed computing platform: over 400,000 CPUs • an efficient approach for CPU intensive job • Complexity of problems and size of data increase rapidly • find better algorithm is preferable to buying supercomputers CMSC 838T – Presentation