1 / 23

Genomes to Life a partnership between Biology and Computing

Genomes to Life a partnership between Biology and Computing. Gary Johnson John Houghton Office of Science. http://www.doegenomestolife.org/. Office of Advanced Scientific Computing Research: Mathematical, Information and Computational Sciences. a brief overview.

teige
Download Presentation

Genomes to Life a partnership between Biology and Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genomes to Lifea partnership between Biology and Computing Gary Johnson John Houghton Office of Science http://www.doegenomestolife.org/

  2. Office of Advanced Scientific Computing Research:Mathematical, InformationandComputational Sciences a brief overview http://www.sc.doe.gov/production/octr/mics/index.html

  3. MICS Mission Discover, develop, and deploy the computational and networking advances that enable researchers in the scientific disciplines to analyze, model, simulate, and predict complex physical, chemical, and biological phenomena important to the Department of Energy (DOE). • support a broad research portfolio in advanced scientific computing – applied mathematics, computer science, networking and collaboratory software operate supercomputers, a high performance network, and related facilities.

  4. …simulation…distributed teams, of complex systems remote access to facilities BasicResearch National Energy Research Scientific Computing Center (NERSC) Advanced Computing Research Facilities Energy Sciences Network (ESnet) Program Strategy Research to enable… BES, BER, FES, HEP, NP • Materials • Chemical • Combustion • Accelerator • HEP • Nuclear • Fusion • Climate • Astrophysics Computational Biology • Integrated Software Infrastructure Centers Teams- mathematicians, computer scientists, application scientists, and software engineers SciDAC! • Grid enabling research • Nanoscience • Topical Computing • Scientific Application Pilots • Collaboratory Tools • • Applied Mathematics • Computer Science • Networking • Collaboratory Pilots High Performance Computing and Network Facilities for Science

  5. Budget Request FY2003- $166,625,000 SBIR/STTR Base Research Facilities • Enhancements over FY2002 • Computational Biology +$5.6M • SciDAC +$5.3M • Facilities +$1.3M Comp. Bio. SciDAC

  6. 200 unscalable 150 100 Time to Solution 50 scalable 0 1000 1 10 100 Problem Size (increasing with number of processors) Applied Mathematical Sciences From the “simple”… …to the complex! Ax=b Ax=Bx F(u,x,y,z)=0 F(u,u’,u’’,…,x,y,z,t)=0 PDE Solvers Nonlinear Solvers Linear Solvers Eigensolvers Protein Folding Combustion Algorithms must be scalable. Ideally, as the problem size grows and the number of processors grows, the solution time does not ! ~60 coupled, nonsymmetric, nonlinear time-dependent PDEs on 10M mesh points. Time steps range from 10-12 (for chemical reaction rates) to 10-2 (for the speed of flame front) Current simulations use 44 amino acids. Actual protein ~300 amino acids. Run times using current techniques? Greater than life of the universe!

  7. AMS Base Research Program Objectives Accomplishments Advance our understanding of science and technology by supporting research in basic applied mathematics and in computational research that facilitates the use of the latest high-performance computer systems. Robust High-Performance Numerical Libraries Adaptive Mesh Refinement (AMR) Sustained Teraflop/s simulations Level Set / Fast Marching Methods Investment in Education Computational Sciences Graduate Fellowship Growth Opportunities Ongoing Projects Applied Mathematics Research: • Ultrascalable Algorithms • (up to millions of PEs) • Mathematical Microscopy • These opportunities will be explored through • Genomes to Life (with BER) • Comp. Nanoscience (with BES) • Fusion Energy (FESAC-ASCAC workshop) Linear Algebra Fluid Dynamics Differential Eqs. Optimization Grid Generation Predictability Analysis & Uncertainty Quantification Automated Reasoning Advanced Numerical Algorithms: PETSc Aztec TAO ADIFOR / ADIC Hypre CHOMBO SuperLU PICO

  8. Challenge – HPC for Science is (still after fifteen years!) Hard to use Inefficient Fragile An unimportant vendor market Vision A comprehensive, integrated software environment which enables the effective application of high performance systems to critical DOE problems Goal– Radical Improvement in Application Performance Ease of Use Time to Solution System Admin Software Development Scientific Applications Res. Mgt Framewrks PSEs Scheduler Compilers Viz/Data Chkpt/Rstrt Debuggers Math Libs File Sys Perf Tools Runtme Tls User Space Runtime Support OS Bypass OS Kernel Node and System Hardware Arch HPC System Elements Computer Science Research

  9. Computer Science Technical Elements 15% 23% 18% 25% 19%

  10. Major Accomplishments • PVM – the first widely successful model for parallel computing • MPI – the lingua franca of today’s parallel computing • MPICH – the open source version of MPI that is the basis for all vendor adaptations • Global Arrays – the distributed shared memory programming model that is at the core of NWChem, the motivating application for SciDAC • CTSS – the first interactive operating system for high performance computers • SUNMOS/Puma/Cougar – the most successful high performance parallel operating system • OSCAR – a partnership with industry, the most widely used open source toolkit for management of Linux clusters

  11. National Collaboratories Why? • The nature of how large scale science is done is changing • Distributed data, computing, people, instruments • Instruments integrated with large-scale computing • Human resources are seldom collocated with the resources needed for their science • Additional drivers • Large and international collaborations • Management of unique national user facilities • Large multi-laboratory science and engineering projects

  12. An End-to-End Problem for Applications Many different types of objects need to be connected to and coordinated by the networks Scientist

  13. Staff • Ed Oliver, Associate Director for Advanced Scientific Computing Research • Dan Hitchcock, Senior Scientific Advisor • Linda Twenty, Senior Budget & Financial Specialist • Walt Polansky, Acting Director MICS • Gary Johnson, ACRTs, Computational Biology • Fred Johnson, Computer Science • William (Buff) Miner, NERSC & Scientific Applications • Thomas Ndousse-Fetter, Network Research • Kimberly Rasar, Senior Info. Tech. (SciDAC) • Chuck Romine, Applied Mathematics • Mary Anne Scott, Collaboratories • George Seweryniak, Esnet • John van Rosendale, Computer Science- Visualization and Data Management • Vacancies- (2) • Jane Hiegel • Susan Kilroy Phone- 301-903-5800 Fax- 301-903-7774 http://www.sc.doe.gov/production/octr/mics/index.html

  14. OASCR Advisory Committee • Committee Chair: Margaret Wright, NYU • Subcommittee Chairs: • Biology: Juan Meza, LBNL • Computing Infrastructure: Jill Dahlberg, General Atomics • Members in common with BERAC: Warren Washington, NCAR • Next Meeting: 2-3 May 2002 Crowne Plaza Hotel 14th and K Streets Washington, DC

  15. Genomes to Life Program History • Phased program startup • FY 2002: OBER • FY 2003: OASCR • Precursor activity • FN 01-21: Advanced Modeling and Simulation of Biological Systems • 9 Awards, $3M • Current solicitations • FN 02-13: Genomes to Life • Program planning • 5 workshops • Goal 4 roadmap • Update to GTL roadmap

  16. GTL Planning Activities • 7-8 August GTL Computing Workshop • 6-7 September Systems Biology & GTL Workshop • 22-23 January Computing Infrastructure Workshop • 6-7 March Computer Science for GTL Workshop • 18-19 March Mathematics for GTL Workshop • 19 April Draft Goal 4 Roadmap • Future New Edition of the GTL Roadmap

  17. GTL Goal 4 Roadmap

  18. Genomes to Life Goals Goal 1 Identify and Characterize the Molecular Machines of Life – the Multiprotein Complexes that Execute Cellular Functions and Govern Cell Form Goal 2 Characterize Gene Regulatory Networks Goal 3 Characterize the Functional Repertoire of Complex Microbial Communities in their Natural Environments at the Molecular Level Goal 4 Develop the Computational Methods and Capabilities to Advance Understanding of Complex Biological Systems and Predict their Behavior

  19. Three Computing Domains • Bioinformatics/Data-Intensive Applications • Biophysics/Compute-Intensive Applications • Biosystems/Complex Systems Modeling

  20. Biology & Computing Perspectives

  21. Domain Challenges • Bioinformatics • Heterogeneous, large and growing data sets • Legacy systems that don’t interoperate and don’t scale • Biophysics • Already bumping up against computational resources • More computation, better algorithms, new theory • Biosystems • Too much data not to have models • Data-poor and biology-poor • Parts list short, but complex systems

  22. Initial Thoughts on Computational Infrastructure

More Related