130 likes | 301 Views
Technology Challenges for Scientific Computing. Dan Hitchcock Advanced Scientific Computing Research Daniel.Hitchcock@science.doe.gov. Overview. Evolution of semiconductors and microprocessors will place significant challenges on all of scientific computing;
E N D
Technology Challenges for Scientific Computing Dan Hitchcock Advanced Scientific Computing Research Daniel.Hitchcock@science.doe.gov
Overview • Evolution of semiconductors and microprocessors will place significant challenges on all of scientific computing; • Decreases in clock speed to save power • Increase in number of simpler cores on chips • Memory – cpu communication • Power constraints • End of constant memory/process weak scaling • Synchronization • These problems are most apparent in moving to exascale but impact computing at all levels. DH NCSA April 20, 2010
Identifying exascale applications and technology for DOE missions • Town Hall Meetings April-June 2007 • Scientific Grand Challenges Workshops November 2008 – October 2009 • Climate Science (11/08), • High Energy Physics (12/08), • Nuclear Physics (1/09), • Fusion Energy (3/09), • Nuclear Energy (5/09), • Biology (8/09), • Material Science and Chemistry (8/09), • National Security (10/09) • Cross-cutting workshops • Architecture and Technology (12/09) • Architecture, Applied Mathematics and Computer Science (2/10) • Meetings with industry (8/09, 11/09) MISSION IMPERATIVES FUNDAMENTAL SCIENCE DOE Exascale Initiative
The Fundamental Issue Intranode/SMP Communication Intranode/MPI Communication On-chip / CMP communication DH NCSA April 20, 2010
Memory TechnologyBandwidth costs power DH NCSA April 20, 2010
Potential System Architectures DH NCSA April 20, 2010
On-Chip Architecture:different approaches to on-chip clustering • Cost of moving long-distances on chip motivates clustering on-chip • 1mm costs ~6pj (today & 2018) • 20mm costs ~120 pj (today & 2018) • FLOP costs ~100pj today • FLOP costs ~25pj in 2018 • Different Architectural Directions • GPU: WARPs of hardware threads clustered around shared register file • CMP: limited area cache-coherence • CMT: hardware multithreading clusters DH NCSA April 20, 2010
More Bad News • I/O to disk will be relatively slower than it is today so traditional checkpointing not practical; • Part of the file system may be on the node; • There will be more silent errors; • Weak scaling approaches (constant memory/flop) probably will not work; • Bulk synchronization very expensive. DH NCSA April 20, 2010
Approaches • Locality, Locality, Locality! • Billion Way Concurrency; • Uncertainty Quantification including hardware variability; • Flops free data movement expensive so: • Remap multiphysics to put as much work per location on same die; • Include embedded UQ to increase concurrency; • Include data analysis if you can for more concurrency • Trigger output to only move important data off machine; • Reformulate to trade flops for memory use. • Lightweight operating systems • What to do about cache coherence DH NCSA April 20, 2010
Possible Machine Abstract Models DH NCSA April 20, 2010
Workshops Past • ExascaleCross-cutting workshops • Architecture and Technology (12/09) • Architecture, Applied Mathematics and Computer Science (2/10) • Heterogeneous Multicore Consortium 1/10 • IESP Oxford April 2010 • Upcoming • Data Management, Analysis & Visualization PI Meeting: Focus on Impact of Exascale Architectures & Preparing for Co-Design Process, Tentatively scheduled for August 17-19, in Santa Fe, NM • Best Practices Workshop on Power September 2010 DH NCSA April 20, 2010
Current Funding Opportunity Announcements • Applied Math • Advancing Uncertainty Quantification (UQ) in Modeling, Simulation and Analysis of Complex Systems -- $3M / year for 3 years to fund 2-6 awards, closes April 26, 2010 • Development of highly scalable approaches for uncertainty analysis in the modeling and simulation of complex natural and engineered systems. • Computer Science • X-Stack Software Research -- $10M / year for 3 years to fund 4-5 awards, closes April 2, 2010 • Development of a scientific software stack that supports extreme scale scientific computing, from operating systems to development environments. • Advanced Architectures and Critical Technologies for Exascale Computing -- $5M / year for 3 years to fund 4-5 awards, Closed March 26, 2010 • Design of energy-efficient, resilient hardware and software architectures and technology for high performance computing systems at exascale. • Scientific Data Management and Analysis at the Extreme Scale -- $5M / year for 3 years for 10-15 awards, Closed March 18, 2010 • Management and analysis of extreme-scale scientific data in the context of petascale computers and/or exascale computers with heterogeneous multi-core architectures. DH NCSA April 20, 2010
Webistes • www.science.doe.gov/ASCR • www.exascale.org DH NCSA April 20, 2010