130 likes | 213 Views
Some Thoughts on Technology and Strategies for Petaflops. Possible paths to Petaflops. Traditional Commodity Clusters Leverage Moore’s law on GP microprocessors Interconnect and memory bandwidth problems Type C machines DARPA HPCS paths (e.g. Cascade etc.) Embedded systems based Clusters
E N D
Possible paths to Petaflops • Traditional Commodity Clusters • Leverage Moore’s law on GP microprocessors • Interconnect and memory bandwidth problems • Type C machines • DARPA HPCS paths (e.g. Cascade etc.) • Embedded systems based Clusters • QCDOC one example • BG/L another example
Beyond Commodity Clusters • Improved design capability • Small groups can design SoCs • Small groups can gain access to state of the art fabrication capabilities • Design cycles are getting shorter thanks to increasing availability of off-the-shelf IP • Blue Logic, MIPS, etc. • QCDOC example
Hardware/Software Co-design • Application kernels • Simple “FORTRAN” like C code - well behaved basic blocks with performance requirement annotations • Compiler builds performance model for each basic block • Decision point based on performance estimate • Compile for GPU or synthesize logic/FGPA code • Generate glue code/runtime
Special purpose SoCs • Networking Processing Units • Core of fast IP switches and routers • Many companies producing 10Gbps components and moving towards 40 Gbps parts • DSPs • Cell phone base stations.. Signal processing and array on a chip processors • Example is 2 GHz, 175 Million transistors 64 processor DSP array, several hundred dollars a chip in quantities of 1,000.
Graphics Accelerators • NVIDIA Geforce4 example • > 100 M transistors • High-speed (QDR) RAM interface > 10 GBps • Moving towards General purpose processors • Cg programming language (programmable shaders) • Evolving to become faster than the main CPU on a commodity based node • Pentium or Itanium2 process becomes a service processor?
Extendable Cores • Possible target for HPC Hardware/Software Co-design • Provides a reconfigurable node platform • Xilinx virtex-pro • Multiple PowerPC cores (1-4) • Millions of gates of FPGA • Clock rates lag high-performance chips • Other vendors producing similar things • MIPS cores, SPARClite cores, etc.
Billion Transistor Dies by 2005/6 • Design challenges and opportunities • Many 32 bit cores available < 500,000 transistors • Several 64 bit cores available < 2,000,000 transistors • Complete SoC libraries becoming available (e.g. Blue Logic, etc.) • Unprecedented opportunity for semi-custom node architectures based on SoC technologies
Design Tools are Improving • We can start to think in terms similar to desktop publishing from 20 years ago • Mass customization will become possible but: • What design Macros are needed ? • How to involve algorithms and applications developers in the design process ? • How to connect with systems software (OS, runtime, libraries)?
Evolution of Commodity Clusters I/O Commodity Network GPU/Node GPU/Node O(1000) nodes GP services ….. SoCs SoCs O(100K) nodes Semi-custom or Reconfigurable ….. High-Performance Interconnect
Systems Software for SoCs • Embedded Processor Systems Software • DSP: real-time OS/Runtime ~40K on chip FLASH ROM (shadow RAM), off chip extensions for future • NPUs: real-time runtime support < 100K typically, some general purpose co-processors (Linux typically used in Juniper) • Graphics processors on chip runtime support upgradeable via device drivers
A Few Recommendations • Comprehensive applications studies • To determine feasibility of acceleration via semi-custom SoC/CLoCs • To understand what OS functions are actually required for full HPC applications • Establish some design challenges • Pick several core algorithms (besides lattice gauge) and do some paper designs to validate the possible advantages of SoC based approaches • An augmented cluster testbed • GP Linux cluster with SoC/CLoC based compute backends