1 / 8

The Blue Gene Experience

The Blue Gene Experience. Manish Gupta IBM T. J. Watson Research Center Yorktown Heights, NY . Blue Gene/L (2005). 136.8 Teraflop/s on LINPACK (64K processors). System. Blue Gene/L. 64 Racks, 64x32x32. Rack. 32 Node Cards. Node Card. 180/360 TF/s 32 TB. (32 chips 4x4x2)

kyra-wilson
Download Presentation

The Blue Gene Experience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Blue Gene Experience Manish Gupta IBM T. J. Watson Research Center Yorktown Heights, NY

  2. Blue Gene/L (2005) 136.8 Teraflop/s on LINPACK (64K processors)

  3. System Blue Gene/L 64 Racks, 64x32x32 Rack 32 Node Cards Node Card 180/360 TF/s 32 TB (32 chips 4x4x2) 16 compute, 0-2 IO cards 2.8/5.6 TF/s 512 GB Compute Card 2 chips, 1x2x1 90/180 GF/s 16 GB Chip 2 processors 5.6/11.2 GF/s 1.0 GB 2.8/5.6 GF/s 4 MB

  4. Blue Gene/L Compute ASIC • Low power processors • Chip-level integration • Powerful networks

  5. Blue Gene/L Networks 3 Dimensional Torus • Interconnects all compute nodes (65,536) • Virtual cut-through hardware routing • 1.4Gb/s on all 12 node links (2.1 GB/s per node) • 1 µs latency between nearest neighbors, 5 µs to the farthest • Communications backbone for computations • 0.7/1.4 TB/s bisection bandwidth, 68TB/s total bandwidth Global Collective • One-to-all broadcast functionality • Reduction operations functionality • 2.8 Gb/s of bandwidth per link • Latency of one way traversal 2.5 µs • Interconnects all compute and I/O nodes (1024) Low Latency Global Barrier and Interrupt • Latency of round trip 1.3 µs Ethernet • Incorporated into every node ASIC • Active in the I/O nodes (1:8-64) • All external comm. (file I/O, control, user interaction, etc.) Control Network

  6. RAS (Reliability, Availability, Serviceability) • System designed for reliability from top to bottom • System issues • Redundant bulk supplies, power converters, fans, DRAM bits, cable bits • Extensive data logging (voltage, temp, recoverable errors … ) for failure forecasting • Nearly no single points of failure • Chip design • ECC on all SRAMs • All dataflow outside processors is protected by error-detection mechanisms • Access to all state via noninvasive back door • Low power, simple design leads to higher reliability • All interconnects have multiple error detections and correction coverage • Virtually zero escape probability for link errors

  7. C-Node 63 C-Node 63 C-Node 0 C-Node 0 CNK CNK CNK CNK Blue Gene/L System Architecture tree Pset 0 Service Node I/O Node 0 SystemConsole Front-endNodes FileServers Linux app app fs client ciod Functional Gigabit Ethernet CMCS torus DB2 I/O Node 1023 Linux I2C app app Control GigabitEthernet LoadLeveler fs client ciod IDo chip JTAG Pset 1023

  8. Example performance graphs (Molecular dynamics)

More Related