400 likes | 802 Views
Massively Distributed Computing and An NRPGM Project on Protein Structure and Function. Computation Biology Lab Physics Dept & Life Science Dept National Central University. From Gene to Protein. About Protein. Function Storage, Transport, Messengers, Regulation… Everything that sustains life
E N D
Massively Distributed ComputingandAn NRPGM ProjectonProtein Structure and Function Computation Biology LabPhysics Dept & Life Science DeptNational Central University
About Protein • Function • Storage, Transport, Messengers, Regulation… Everything that sustains life • Structure: shell, silk, spider-silk, etc. • Structure • String of amino acid with 3D structure • Homology and Topology • Importance • Science, Health & Medicine • Industry – enzyme, detergent, etc. • An example – 3hvt.pdb
Problem Structure & Function • Primary sequence Native state with 3D structure • Structure function • Expensive and time consuming • Misfolding means malfunction • Mad cow disease (“prion” misfolds)
The Folding Problem • Complexity of mechanism & pathway is huge challenge to science and computation technology
Molecular Dynamics (MD) • Molecular’s behavior determined by • Ensemble statistics • Newtonian mechanics • Experiment in silico • All-atom w. water • Huge number of particles • Super-heavyduty computation • Software for macromolecular MD available • CHARMm, AMBER, GROMACS
Basic Statistics for Protein MD Simulation • Atoms in a small protein plus surrounding water (N) 32000 • Approximate number of interactions in force calculation (N2/2 ) 0.5x109 • Machine instructions per force calculation 1000 • Machine time per calculation (CPU: 3G) 160 sec • Typical time-step size 0.5x10–15 sec • Total number of steps for 1 ms folding 0.5x109 steps • Total machine time (160 sec x 0.5x109) 106 days
How to overcome the factor of 1 million • A two-pronged approach • Faster or more CPUs • Nature of bottle-neck in protein folding dictated by Boltzmann distribution, can be overcome by large statistics (parallel computing NOT needed) • Our solution: Massively distributed computing • We seek factor of ~ 10,000 • Note. IBM’s solution: Blue Machine w/ 106 CPUs • Shorten computation time • Many simulation steps needed b/c short time-scale of fast (vibrational) mode of ~ 10fs • But time-scale of folding motion slow, ~ 1 ns • Ideal solution: by-pass or smooth out fast modes
Protein Studies byMassively Distributed Computing A Project in National Research Program on Genomic Medicine • Scientific • Protein folding, structure, function, protein-molecule interaction • Algorithm, force-field • Computing • Massive distributive computing • Education • Everyone and Anyone with a personal PC can take part • Industry – collaborative development
Distributive Computing • Concept • Computation through internet • Utilize idle PC power (through screen-saver) • Advantage • Cheap way to acquire huge computation power • Perfectly suited to task • Huge number of runs needed to beat statistics • Parallel computation NOT needed • Massive data - good management necessary • Public education – anyone w/ PC can take part
Hardware Strategies • Parallel computation (we are not this) • PC cluster • IBM (The blue gene), 106 CPU • Massive distributive computing • Grid computing (formal and in the future) • Server to individual client (now in inexpensive) • Examples: SETI, folding@home, genome@home • Our project: protein@CBL
Software Components • Dynamics of macromolecules • Molecular dynamics, all atomistic or mean-field solvent • Computer codes • GROMACS (for distributive comp; freeware) • AMBER and others(for in-house comp; licensed) • Distributed Computing • COSM - a stable, reliable, and secure system for large scale distributed processing (freeware)
Structure of COSM (network dist’n) Client System tests (test all Cosm functions) Self-tests Connect to server Send Request Recv Assignment Running Simulation Put Result Get Accept Packet Request Packet Assignment Packet Result Packet Accept
Protein database • Temporary • databank • Job analysis • Automatic • temperature • swaps by • parallel tem- • pering Databank Human intervention Jobs Exceptions Send(COSM) clients Receive Structure at Server end
Server Receive If crash MD Run Restart Return result Delete files Structure at Client end
Multi-temperature Annealing • Project suited for multi-temperature runs – Parallel Tempering • Two configurations with energy and temperature (E1, T1) and (E2, T2) Temperature swapped with probability P = min{1, exp[-(E2-E1)(1/kT1 – 1/kT2)]} • Mode of operation • Send same peptide at different temperature to many clients; let run; collect; swap T’s by multiple parallel tempering; randomly redistribute peptides with new T’s to clients
Server client Old temperatures client Swap temps by Multiple “peptide” parallel tempering client Databank client client client New temperatures client Multi-temperature Annealing (II)
Potential of Massive Distributive Computing • Simulation of folding a small peptide for 100ns • Each run (105 simulation steps; 100 ps) ~100 min PC time • 1000 runs (100 ns) per “fold” ~105 min • Approx. 70 days on single PC running 24h/day • Ideal client contribute 8h/day • 100 clients 70x3/100 = 2 days per fold • 10,000 clients 50 folds/day(small peptide) • Mid-sized protein needs > 1 ms to fold • 106 days on single PC • 10,000 clients ~300 days • 106 clients (!!) ~3 days
Schedule • Launched –August 2002 • Small PC-cluster – October 2002 • In-house runs to learn codes • Infrastructure for Distributive Computation • InstallationGromacs & COSM – January-March 2003 • Test runs • IntraLaboratory test run – March-October 2003 • NCU test run – July-October 2003 • Launched on WWW – November 20 2003 • Scientific studies • Getting familiar w/ MD and folding of peptides • Looking for ways to increase MD time step
Current status of PAC • Last beta version Pac v0.9 • Released on July 15 • To lab CBL members & physics dept • About 25 clients • First alpha version Pac v1.0 released October 1 2003 • Current version Pac v1.2 • Releases to public on 20 November 2003 • In search of clients • Portal in “Educities”http://www.educities.edu.tw/~3,700 downloads, ~700 active clients • PC’s in university administrative units • City halls and county government offices • Talks and visits to universities and high schools
1L2Y: (20 res.) NMR Structure Of Trp-Cage Miniprotein Construct Tc5B; synthetic. 1SOL: (20 res.) A Pip2 and F-Actin-Binding Site Of Gelsolin, Residue 150-169. One helix. 1ZDD: (35 res.) Disulfide-Stab-ilized Mini Protein A Domain. Two helices. Some current Simulations
A small test case – 1SOL • Target peptide – 1SOL.pdb • 20 amino acids; 3-loop helix and 1 hairpin; 352 atoms; ~4000 bonds interaction • Unit time step= 1 fs • Compare constant temperature and parallel-tempering • Constant T @ 300K • Parallel-tempering with about 20 peptides, results returned to server for swapping after each “run”, or 105 time steps (100 ps)
Parallel-tempering (1SOL) Temperature (K) Number of runs (in units of 100 ps) P = min{1, exp[-(E2-E1)(1/kT1 – 1/kT2)]}
Initial structure Native conformation Const temp. (20ns) Parallel-temp. (1.6ns) Preliminary result on 1SOL
A second test case – 1L2Y • Simulation target – Trp-Cage • 20 amino acids, 2 helical loops • A short, artificial and fold-by-itself peptide • Have been simulated with AMBER • Folding mechanism not well understood
Temperature (K) Number of runs (in units of 100 ps) A case in swap History (1L2Y)
Preliminary result on 1L2Y (11 peptides) Native state Initial state PAC 6ns
Speeding up simulation - Separating the fast from slow modes • Fast modes associated with bonded interactions • Bond-stretching vibrations ~ 10-20 fs • Bond-angle bending vibrations ~ 20-40 fs • Slow modes associated with dihedral angles • Of the order of 0.1 ns • Alpha-helix folds in ~ 1 -10 ns • Beta-sheets folds in ~ 10 -100 ns • Native structure ~ 1 ms -1 s
bij i j θ0 i k j Bonded interactions • Bond stretching • Harmonic angle potential
Bond-stretching vibrations Bond-stretching vibrations with an approximate oscillation or relaxation time ζ≈10 fs for bond involving a hydrogen atom (C-H)
Bond-stretching vibrations (II) Std < 0.03 A; very small compared with tolerance in structure. Most codes including GROMACS and AMBER have option to freeze out degree of freedom.
Bond-angle bending vibrations Bond-angle bending vibrations with ζ ≈20 fs for bond angles involving hydrogen atom (H-N-C).
Bond-angle bending vibrations (II) Unique value with relatively small std (~ 3-5 degrees). But cannot be frozen; looking for ways to “half-freeze.”
Current and future efforts • Computing facility • expand the base of PAC clients; target 10,000 • Data management • efficient server-client protocol • efficient management and analysis of data when client number is large • Running simulations • optimum implementation of parallel tempering • reduce size of water box • Dealing with fast modes • freeze bond stretching • isolate bond-angle bending deg. of freedom for special treatment; new (heavy) code-writing • target time-step: > 20 fs; ultimately 100 fs
The Team • Funded by NRPGM/NSC • Computational Biology Laboratory Physics Dept & Life Sciences Dept National Central University • PI: Professor HC Lee (Phys & LS/NCU) • Jia-Lin Lo (PhD student) • Jun-Ping Yiu (MSc Res. Assistant) • Chien-Hao Wei (MSc RA) • Engin Lee ( MSc student ) • Dr. Richard Tseng (PDF, since May 2004) • Visiting scientist: physicist/computer specialist (TBA)
Website http://protein.ncu.edu.tw client_stats
Please visit http://protein.ncu.edu.tw and let your PC take part in this project while you sleep Thank you