260 likes | 383 Views
International Exascale Software Program. Abani K. Patra, J. Dongarra, P. Beckman …. Office of Cyberinfrastructure, National Science Foundation. Science Case For Exascale. DOE Workshops @ ~100 People Climate Science (11/08) High Energy Physics (12/08) Nuclear physics (1/09)
E N D
International Exascale Software Program • Abani K. Patra, J. Dongarra, P. Beckman …. • Office of Cyberinfrastructure, • National Science Foundation
Science Case For Exascale • DOE Workshops @ ~100 People • Climate Science (11/08) • High Energy Physics (12/08) • Nuclear physics (1/09) • Fusion Energy (3/09) • Nuclear Energy (5/09) • Biology (8/09) • Basic Energy Science (8/09) • Joint National Security (10/09) • Computer Science • Mathematics • Computer Architecture Strong science case for the continued escalation of high-end computing. 2 Broad consensus of need to redesign and replace many of the algorithms and software infrastructure that HPC has built on for more than a decade. www.exascale.org
Questions? • What are the new applications that are emerging or likely to emerge in the coming decade? • How can NSF best stimulate development of exascale software applications? • How can useful software that has been developed as part of the exascale effort be sustained beyond the development period? • What systems software will be required? Distributed systems support, programming environments, runtime support, data-management user tools?
Questions? • What application support environments will be needed? Application packages, numeric and non-numeric library packages, problem-solving environments? • How can NSF aid or catalyze developments that make it possible to use the same tools, including compilers, debuggers and performance tools, on system scales all the way down to the typical researcher’s laptop or desktop? • What education and training actions should be considered to prepare researchers, students and educators for future cyberinfrastructure?
IESP Executive Committee Jack Dongarra, Pete Beckman, Patrick Aerts, Frank Cappello, Thomas Lippert, Satoshi Matsuoka, Paul Messina, Anne Trefethen, Mateo Valero 6 www.exascale.org
Performance Development in Top500 7 100 Pflop/s 10 Pflop/s SUM 1 Pflop/s Gordon Bell Winners N=1 100 Tflop/s 10 Tflop/s 1 Tflop/s N=500 100 Gflop/s 10 Gflop/s 1 Gflop/s 100 Mflop/s www.exascale.org www.exascale.org
Exponential growth in parallelism for the foreseeable future 8 www.exascale.org
Factors that Necessitate Redesign • Extreme parallelism and hybrid design • Preparing for million/billion way parallelism • Tightening memory/bandwidth bottleneck • Limits on power/clock speed implication on multicore • Reducing communication will become much more intense • Memory per core changes, byte-to-flop ratio will change • Necessary Fault Tolerance • MTTF will drop • Checkpoint/restart has limitations • Software infrastructure does not exist today
AMD Phenom http://www.amd.com/us-en/assets/content_type/DigitalMedia/43264A_hi_res.jpg Factors that Necessitate Redesign • Advances in most branches of science and engineering are critically dependent on increasingly complex multi-scale, multi-physics, data driven computations and analysis. • Complexity of Systems • Data Intensive Scalable Computing • Workflows, Grids, Clouds ... First Cosmological simulations to include black hole physics by Di Matteo et. al. at Carnegie Mellon funded by OCI and MPS/AST. Optimal siting of oil exploration platform estimated by using simulation and optimization tools to maximize product • All this complexity dealt with by software and tools!
Exascale Computing • Exascale systems are likely feasible by 2017±2 • 10-100x106 processing elements (cores or mini-cores) with chips of 1,000 cores per socket, clock rates will grow slower; Heterogeneous cores • 3D packaging likely perhaps with optics based interconnects • 10-100 PB of aggregate memory • Hardware and software based fault management • Performance per watt — stretch goal 100 GF/watt Þ >> 10 – 100 MW Exascale system • Power, area and capital costs (?) will be significantly higher than for petascale Google: exascale computing study
A Call to Action • Hardware has changed dramatically while software ecosystem has remained stagnant • Previous approaches have not looked at co-design of multiple levels in the system software stack (OS, runtime, compiler, libraries, application frameworks) • Need to exploit new hardware trends (e.g., manycore, heterogeneity) that cannot be handled by existing software stack, memory per socket trends • Emerging software technologies exist, but have not been fully integrated with system software, e.g., UPC, Cilk, CUDA, HPCS • Community codes unprepared for sea change in architecture
IESP Goal • Improve the world’s simulation and modeling capability by improving the coordination and development of the HPC software environment Build an international plan for developing the next generation open source software for scientific high-performance computing
Purpose • The IESP software roadmap is a planning instrument designed to enable the international HPC community to improve, coordinate and leverage their collective investments and development efforts. • After needs are determined, the task will be to construct the organizational structures suitable to accomplish the work
International Community Effort 15 This needs to be an international collaboration because of the: • Scale of investment • Need for international input on requirements • No global evaluation of key missing components • Hardware features are uncoordinated with software development • It’s a “flat world” in software -- need to harness all available intellectual resources -- US, Asia, Europe … www.exascale.org
Timeline: • SC08 (Austin TX) meeting to generate interest • Funding from DOE’s Office of Science & NSF Office of Cyberinfratructure and sponsorship by Europeans and Asians • US meeting (Santa Fe, NM) April 6-8, 2009 • 65 people • NSF’s Office of Cyberinfrastructure funding • European meeting (Paris, France) June 28-29, 2009 • 70 people • Outline Report • Asian meeting (Tsukuba Japan) October 18-20, 2009 • Draft roadmap • Refine Report • SC09 (Portland OR) BOF to inform others • Public Comment • Draft Report presented Nov 2008 16 Apr 2009 Jun 2009 Oct 2009 Nov 2009 www.exascale.org
Four Goals for IESP 17 • Strategy for determining requirements • clarity in scope is the issue • Comprehensive software roadmap • goals, challenges, barriers and options • Resource estimate and schedule • scale and risk relative to hardware and applications • A governance and project coordination model • Is the community ready for a project of this scale, complexity and importance? • “Can we be trusted to pull this off?” ale.org
Key Trends Requirements on X-stack • Programming models, applications, and tools must address concurrency • Software and tools must manage power directly • Software must be resilient • Software must address change to heterogeneous nodes • Software must be optimized for new Memory ratios and need to solve parallel I/O bottleneck • Increasing Concurrency • Reliability Challenging • Power dominating designs • Heterogeneity in a node • I/O and Memory: ratios and breakthroughs
Priority Research Direction (one for each component) Key challenges Summary of research direction Brief overview of the barriers and gaps What will you do to address the challenges? Potential impact on software component Potential impact on usability, capability, and breadth of community What capabilities will result? What new methods and components will be developed? How will this impact the range of applications that may benefit from exascale systems? What’s the timescale in which that impact may be felt?
4.x <component> Technology drivers Alternative R&D strategies Recommended research agenda Cross-cutting considerations
4.2.4 Numerical Libraries Technology drivers Hybrid architectures Programming models/languages Precision Fault detection Energy budget Memory hierarchy Standards Alternative R&D strategies Message passing Global address space Message-driven work-queue Recommended research agenda Hybrid and hierarchical based software (eg linear algebra split across multi-core / accelerator) Autotuning Fault oblivious sw, Error tolerant sw Mixed arithmetic Architectural aware libraries Energy efficient implementation Algorithms that minimize communications Crosscutting considerations Performance Fault tolerance Power management Arch characteristics
Priority Research Direction Key challenges Summary of research direction • Fault oblivious, Error tolerant software • Hybrid and hierarchical based algorithms (eg linear algebra split across multi-core and gpu, self-adapting) • Mixed arithmetic • Energy efficient algorithms • Algorithms that minimize communications • Autotuning based software • Architectural aware algorithms/libraries • Standardization activities • Async methods • Overlap data and computation • Adaptivity for architectural environment • Scalability : need algorithms with minimal amount of communication • Increasing the level of asynchronous behavior • Fault resistant software– bit flipping and loosing data (due to failures). Algorithms that detect and carry on or detect and correct and carry on (for one or more) • Heterogeneous architectures • Languages • Accumulation of round-off errors Potential impact on software component Potential impact on usability, capability, and breadth of community • Efficient libraries of numerical routines • Agnostic of platforms • Self adapting to the environment • Libraries will be impacted by compilers, OS, runtime, prog env etc • Standards: FT, Power Management, Hybrid Programming, arch characteristics • Make systems more usable by a wider group of applications • Enhance programmability
4.2.4 Numerical Libraries Numerical Libraries Structured grids Unstructured grids FFTs Dense LA Sparse LA Monte Carlo Optimization Scaling to billion way Fault tolerant Self adapting for precision Energy aware Self Adapting for performance Architectural transparency Complexity of system Language issues Std: Fault tolerant Heterogeneous sw Std: Energy aware Std: Arch characteristics Std: Hybrid Progm 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Next Steps • Revise and extend initial draft • Build management and collaboration plans • Work with all stakeholders to plan research activities • Next workshop(s) in the spring -- greater application focus • NSF/OCI -- exascale exploration awards