1 / 13

Some Thoughts on Technology and Strategies for Petaflops

Some Thoughts on Technology and Strategies for Petaflops. Possible paths to Petaflops. Traditional Commodity Clusters Leverage Moore’s law on GP microprocessors Interconnect and memory bandwidth problems Type C machines DARPA HPCS paths (e.g. Cascade etc.) Embedded systems based Clusters

nola
Download Presentation

Some Thoughts on Technology and Strategies for Petaflops

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Some Thoughts on Technology and Strategies for Petaflops

  2. Possible paths to Petaflops • Traditional Commodity Clusters • Leverage Moore’s law on GP microprocessors • Interconnect and memory bandwidth problems • Type C machines • DARPA HPCS paths (e.g. Cascade etc.) • Embedded systems based Clusters • QCDOC one example • BG/L another example

  3. Beyond Commodity Clusters • Improved design capability • Small groups can design SoCs • Small groups can gain access to state of the art fabrication capabilities • Design cycles are getting shorter thanks to increasing availability of off-the-shelf IP • Blue Logic, MIPS, etc. • QCDOC example

  4. Hardware/Software Co-design • Application kernels • Simple “FORTRAN” like C code - well behaved basic blocks with performance requirement annotations • Compiler builds performance model for each basic block • Decision point based on performance estimate • Compile for GPU or synthesize logic/FGPA code • Generate glue code/runtime

  5. Special purpose SoCs • Networking Processing Units • Core of fast IP switches and routers • Many companies producing 10Gbps components and moving towards 40 Gbps parts • DSPs • Cell phone base stations.. Signal processing and array on a chip processors • Example is 2 GHz, 175 Million transistors 64 processor DSP array, several hundred dollars a chip in quantities of 1,000.

  6. Graphics Accelerators • NVIDIA Geforce4 example • > 100 M transistors • High-speed (QDR) RAM interface > 10 GBps • Moving towards General purpose processors • Cg programming language (programmable shaders) • Evolving to become faster than the main CPU on a commodity based node • Pentium or Itanium2 process becomes a service processor?

  7. Extendable Cores • Possible target for HPC Hardware/Software Co-design • Provides a reconfigurable node platform • Xilinx virtex-pro • Multiple PowerPC cores (1-4) • Millions of gates of FPGA • Clock rates lag high-performance chips • Other vendors producing similar things • MIPS cores, SPARClite cores, etc.

  8. Billion Transistor Dies by 2005/6 • Design challenges and opportunities • Many 32 bit cores available < 500,000 transistors • Several 64 bit cores available < 2,000,000 transistors • Complete SoC libraries becoming available (e.g. Blue Logic, etc.) • Unprecedented opportunity for semi-custom node architectures based on SoC technologies

  9. Design Tools are Improving • We can start to think in terms similar to desktop publishing from 20 years ago • Mass customization will become possible but: • What design Macros are needed ? • How to involve algorithms and applications developers in the design process ? • How to connect with systems software (OS, runtime, libraries)?

  10. Evolution of Commodity Clusters I/O Commodity Network GPU/Node GPU/Node O(1000) nodes GP services ….. SoCs SoCs O(100K) nodes Semi-custom or Reconfigurable ….. High-Performance Interconnect

  11. Systems Software for SoCs • Embedded Processor Systems Software • DSP: real-time OS/Runtime ~40K on chip FLASH ROM (shadow RAM), off chip extensions for future • NPUs: real-time runtime support < 100K typically, some general purpose co-processors (Linux typically used in Juniper) • Graphics processors on chip runtime support upgradeable via device drivers

  12. A Few Recommendations • Comprehensive applications studies • To determine feasibility of acceleration via semi-custom SoC/CLoCs • To understand what OS functions are actually required for full HPC applications • Establish some design challenges • Pick several core algorithms (besides lattice gauge) and do some paper designs to validate the possible advantages of SoC based approaches • An augmented cluster testbed • GP Linux cluster with SoC/CLoC based compute backends

More Related