1 / 40

From Sequence Analysis to Simulations: Applications of HPC in Modern Biology

From Sequence Analysis to Simulations: Applications of HPC in Modern Biology. R. Sankararamakrishnan Department of Biological Sciences & Bioengineering IIT-Kanpur. IIT-K REACH Symposium 2010 Oct 9 th 2010. Computers and Computing in Biology. Mathematical Biology Biostatistics

oya
Download Presentation

From Sequence Analysis to Simulations: Applications of HPC in Modern Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Sequence Analysis to Simulations: Applications of HPC in Modern Biology R. Sankararamakrishnan Department of Biological Sciences & Bioengineering IIT-Kanpur IIT-K REACH Symposium 2010 Oct 9th 2010

  2. Computers and Computing in Biology Mathematical Biology Biostatistics Biomathematics Quantitative Biology Biophysics Bioinformatics Computational Biology

  3. Definitions What is Bioinformatics? - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. What is Computational Biology? - The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems. - NIH Definition http://www.bisti.nih.gov/

  4. Explosive growth of biological data

  5. HPC Applications: Three examples • Evolutionary relationship among a given set of protein or DNA sequences • Drug Discovery and Design • Structure-function relationship of large biomolecular assemblies

  6. I. HPC in Phylogenetics

  7. Phylogeny and Phylogenetic tree • Study of evolutionary relationships (sequences/species) • Relationships between organisms with common ancestor • Phylogenetic tree is a graph representing evolutionary history of sequences/species

  8. Orangutan Orangutan Human Human Chimpanzee Gorilla Chimpanzee Gorilla Phylogenetic trees can be represented in two different ways Rooted Tree Unrooted Tree Direction of evolution No assumption about common ancestry Has a unique node

  9. Molecular phylogeny in a criminal investigation

  10. Maximum Likelihood Method – An Introduction David Mount (2002)

  11. Maximum Likelihood Method – An Introduction David Mount (2002)

  12. For each unrooted tree, there will be many possible rooted trees

  13. Number of possible unrooted and rooted trees

  14. Computing phylogenetic trees using ML method Maximum likelihood phylogeny problem is NP-hard Very CPU intensive For trees containing more than 20 to 25 sequences, the problem cannot be solved any more Efficient heuristic tree search algorithms are required to reduce the size of the search space Recently developed algorithms: IQPNNI, PHYML, GARLI, RAxML None of these algorithms are guaranteed to find theML tree; only yield the best known ML tree

  15. Parallelization strategy Ott et al. (2008)

  16. RAxML performance in some HPC platforms • 212 sequences, 566,470 base pairs • One of the largest datasets analyzed under ML • IBM BlueGene/L; 1024 CPUs • 7 distinct tree searches in 14 hours Ott et al. (2008)

  17. Phylogenetic analysis of plant channel proteins identified new subfamily Bansal and Sankararamakrishnan, BMC Struct. Biol. (2007) Gupta and Sankararamakrishnan, BMC Plant Biol. (2009)

  18. II. HPC in Drug Discovery & Drug Design

  19. Roles of Computation in Drug Discovery “Is there really a case where a drug that is on the market was designed by a computer?” “The reality is that the use of computers and computer methods permeates all aspects of drug discovery today” Jorgensen (2004)

  20. Computation in Drug Discovery “Drug discovery is complex: Successful teams and companies need to congratulated, whereas search for one individual or computer program is counterproductive. There is not going to be a voila moment at the computer terminal. Instead, there is systematic use of wide-ranging computational tools to facilitate and enhance the drug discovery process” Jorgensen (2004)

  21. Structure-based Drug Design – An Introduction http://csb.stanford.edu/levitt/demo_lectures/lec7/Lecture7/Discovering_Drugs/pages/Structure_Based_Drug_Design.html http://www.biocryst.com/our_science

  22. www.bmsc.washington.edu/WimHol/sbdd3.JPG Wim Hol

  23. Drug targets and Drug discovery: Issues Lead Generation Lead optimization De novo design Virtual screening All drugs that are presently in the market are estimated to target less than 500 biomolecules Docking & Scoring Issues: Scoring function, solvent effect and protein flexibility Bleicher et al. (2003)

  24. Four proteins: trypsin, HIV PR, CDK2 and AChE • Test set for each protein: 10,000 randomly selected compounds • 6000 docking poses were selected for the top 1000 compounds • They served as initial conformations for MD simulations Combination of docking and MD showed a higher and more stable enrichment performance than docking method used alone

  25. A special purpose computer, MDGRAPE-3, was used for MD simulations • It is a cluster of personal computers • Each equipped with 24 MDGRAPE-3 chips and has a peak speed of approximately 2 Tflops • 50 such computers were used • Average computational time for a single protein-ligand complex is 2.5 h • For 6,000 protein-ligand conformations, calculations were completed in a week

  26. Steered MD in Drug Discovery Jorgensen, 2010 • Steered Molecular Dynamics to compute the force required to extract the inhibitors from enzymes • A small string is connected to the ligand in the complex • This string is pulled at constant velocity into the surrounding water • Force is determined from the extension of the spring and recorded as a function of time • Strongly-bound inhibitors  higher peak forces • Weaker inhibitors  flatter profiles

  27. Protein-protein interactions in programmed cell death Bcl-2 family complex structures Total number of atoms: ~50,000 to ~75,000 Simulation period: 50 ns Lama and Sankararamakrishnan, Proteins (2008) Lama and Sankararamakrishnan, Biochemistry (2010)

  28. III. Large Biomolecular Assemblies

  29. First Biomolecular simulation was performed in 1977

  30. MD simulations of channel proteins in bilayers AQP1: 75057 Atoms GlpF: 81006 Atoms PfAQP: 81503 Atoms • 30ns production run was performed for all the three systems. • Each simulation takes ~40 days CPU time (Total CPU time ~ 120 days). Alok Jain, Ravi Verma and R. Sankararamakrishnan, Manuscript in preparation

  31. Simulations reaching the million-atom mark Complete virus: 1 million atoms (Freddolino et al., 2006) Arrays of light-harvesting proteins – 1 million atoms (Chandler et al., 2008) BAR domain proteins – 2.3 million atoms (Yin et al., 2009) The flagellum – 2.4 million atoms (Kitao et al., 2006)

  32. Complete virus: 1 million atoms Minimization and equilibration Cluster of 48 AMD Athlon 2600+ processors Simulation 256 Altix nodes at NCSA @UIUC 1.1. ns/day (Freddolino et al., 2006)

  33. Functions of large molecular machines Fungal fatty acid synthase 30S ribosome

  34. MD of protein-conducting channel bound to ribosome Bacterial ribosomes are important targets for antibiotics 2.7 million atoms 50 ns simulation Largest system simulated to date Gumbart et al. (2009)

  35. Drug Design & Discovery HPC Large Biomolecular systems Phylogenetic analysis

  36. HPC Platforms for Biology Applications FPGA-boards: Field programmable gate arrays are ICs which can be programmed. FGPA boards with commonly used bioinformatics algorithms are available Graphics-Processing Unit (GPU):All bioinformatics applications Grid Computing: Many applications Distributed Computing: Protein folding, Drug docking Cloud Computing:

  37. Acknowledgements • Anjali Bansal • Dilraj Lama • Alok Jain • Tuhin Kumar Pal • Priyanka Srivastava • Vivek Modi • Ravi Kumar Verma • Krishna Deepak • Phani Deep DST, DBT, CSIR, MHRD

More Related