1 / 40

Case Study: Blue Gene P

Case Study: Blue Gene P. Successor of Blue Gene L Largest machine at Lawrence Livermore National Lab Available since end of 2008. BGP Chip. Roadrunner. Jaguar: A Cray XT4. Installed at DOE Oak Ridge National Lab Position 5 on TOP500 http://www.top500.org/list/2008/06/100

bertha
Download Presentation

Case Study: Blue Gene P

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Case Study: Blue Gene P • Successor of Blue Gene L • Largest machine at Lawrence Livermore National Lab • Available since end of 2008

  2. BGP Chip

  3. Roadrunner

  4. Jaguar: A Cray XT4 • Installed at DOE Oak Ridge National Lab • Position 5 on TOP500 • http://www.top500.org/list/2008/06/100 • Peak 260 TFlops, Linpack 205 TFlops • 7,832 quad-core 2.1 GHz AMD Opteron processors and 62 TB of memory (2 GB per core).

  5. Cray XT4 Node 9.6 GB/sec 9.6 GB/sec 9.6 GB/sec 9.6 GB/sec 9.6 GB/sec 6.4 GB/sec direct connect HyperTransport • 4-way SMP • >35 Gflops per node • Up to 8 GB per node • OpenMP Support within socket 2 – 8 GB 9.6 GB/sec 12.8 GB/sec direct connect memory(DDR 800) CraySeaStar2+Interconnect

  6. Cray XT5 Node 9.6 GB/sec 9.6 GB/sec 9.6 GB/sec 9.6 GB/sec 9.6 GB/sec 9.6 GB/sec 2 – 32 GB memory 6.4 GB/sec direct connect HyperTransport • 8-way SMP • >70 Gflops per node • Up to 32 GB of shared memory per node • OpenMP Support 25.6 GB/sec direct connect memory CraySeaStar2+Interconnect

  7. Cray XT6 node 9.6 GB/sec 9.6 GB/sec 9.6 GB/sec 9.6 GB/sec 9.6 GB/sec 9.6 GB/sec 6.4 GB/sec direct connect HyperTransport • 2 eight or 12-core AMD Opteron 6100 • >200 Gflops per node • Up to 64 GB of shared memory per node • OpenMP Support 83.5 GB/sec direct connect memory CraySeaStar2+Interconnect

  8. XT6 Processor Choices

  9. XT6 NodeDetails: 24-core Magny Cours • 2 Multi-Chip Modules, 4 Opteron Dies • 8 Channels of DDR3 Bandwidth to 8 DIMMs • 24 (or 16) Computational Cores, 24 MB of L3 cache • Dies are fully connected with HT3 Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound DDR3 Channel Greyhound Greyhound DDR3 Channel Greyhound Greyhound Greyhound Greyhound DDR3 Channel HT3 DDR3 Channel Greyhound Greyhound Greyhound Greyhound Greyhound Greyhound DDR3 Channel Greyhound Greyhound DDR3 Channel Greyhound Greyhound Greyhound Greyhound DDR3 Channel DDR3 Channel 6MB L3 Cache 6MB L3 Cache HT3 HT3 HT3 6MB L3 Cache 6MB L3 Cache HT3 HT1 / HT3 To Interconnect

  10. Cray SeaStar2+ Interconnect Cray XT6 systems ship with the SeaStar2+ interconnect Custom ASIC Integrated NIC / Router MPI offload engine Connectionless Protocol Link Level Reliability Proven scalability to 225,000 cores DMAEngine HyperTransport Interface 6-PortRouter Memory BladeControlProcessorInterface PowerPC440 Processor

  11. Cray Network Evolution • SeaStar • Built for scalability to 250K+ cores • Very effective routing and low contention switch • Gemini • 100x improvement in message throughput • 3x improvement in latency • PGAS Support, Global Address Space • Scalability to 1M+ cores • Aries • Ask me about it

  12. New Axial Turbofan – 78% Efficient

  13. Scalability with Efficient Cooling Vertical Cooling Eliminates hot and cold aisles Graduated fin design Delivers constant operating temperature Laminar flow unobstructed by board logic 42 fins 37 fins 24 fins 18 fins Air Flow

  14. Cray XE6 Chassis Topology Z Y X

  15. System Layout Compute node Login node Network node Boot/Database nodes I/O and Metadata nodes

  16. K-Computer • Kobe, Japan • Riken Research Center • 10 Petaflop computer • Implemented by Fujitsu

  17. Why K?

  18. Specification • SPARC 64 VIII • 256 FP registers • SIMD extension with 2 SP or DP operations • 80.000 nodes • Entire system will be available in November 2012. • 640.000 cores • Performance 10 PFLOPS • Memory 1PB (16 GB per node) • 9.89 MW power consumption (~10.000 suburban homes), 824.6 GFlop/kWatt

  19. K-Computer Network • 6D network, called Tofu (torus fusion) network • Torus network • It can be divided into arbitrary rectangular chunks each one with a 3D torus topology. This allows good utilization (no fragmentation) and good application performance. • It even provides a 3D torus in case of a faulty node. • 10 GB/s bidrectional links • Entry bandwidth is 100 GB/s • Flexible packet length from 32 to 2048 bytes to reduce protocol overhead • Virtual cut-through routing, i.e. packet buffering only if a link is blocked. • Hardware support for collectives • Barriers and reductions

  20. European Exascale Projects (1018 FLOP/S) • Project towards exascale • DEEP • Started in December 2011, Research Centre Jülich • Collaboration with Intel, ParTech, and 8 other partners • Xeon cluster with 512 Knights Corner Booster • Knights Corner is the first processor in Intel‘s MIC series • Started in October 2011, Barcelona Supercomputing Center • Collaboration with ARM and Bull and 6 other partners • Led by EPCC (Edinburgh Parallel Computing Center) • Collaboration with Cray • Focusing on applications and system software

  21. Summary • Scalable machines are distributed memory architectures • Examples • Blue Gene P • Roadrunner • Cray XT4/5/6 • K-Computer

More Related