1 / 23

SOS7: “Machines Already Operational” NSF’s Terascale Computing System

SOS7: “Machines Already Operational” NSF’s Terascale Computing System. SOS-7 March 4-6, 2003 Mike Levine, PSC. Outline. Overview of TCS, the US-NSF’s Terascale Computing System. Answering 3 questions: Is your machine living up to performance expectations? … What is the MTBI? …

carlyn
Download Presentation

SOS7: “Machines Already Operational” NSF’s Terascale Computing System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SOS7: “Machines Already Operational”NSF’s Terascale Computing System SOS-7 March 4-6, 2003 Mike Levine, PSC

  2. Outline • Overview of TCS, the US-NSF’s Terascale Computing System. • Answering 3 questions: • Is your machine living up to performance expectations? … • What is the MTBI? … • What is the primary complaint, if any, from users? • [See also PSC web pages & Rolf’s info.]

  3. Q1: Performance • Computational and communications performance is very good! • Alpha processors & ES45 servers: very good • Quadrics bw & latency: very good. • ~74% of peak on Linpack; >76% on LSMS • More work on disk IO. • This has been a very ease “port” for most users. • Easier than some Cray  Cray upgrades.

  4. Q2: MTBI (Monthly Average) • Compare with theoretical prediction of 12 hrs. • Expect further improvement (fixing systematic problems).

  5. Time Lost to Unscheduled Events • Purple: nodes requiring cleanup • Worst case is ~3%

  6. Q3: Complaints • #1: “I need more time” (not a complaint about performance) • Actual usage >80% of wall clock • Some structural improvements still in progress. • Not a whole lot more is possible! • Work needed on • Rogue OS activity. [recall Prof. Kale’s comment] • MPI & global reduction libraries. [ditto] • System debugging and fragility. • IO performance. • We have delayed full disk deployment to avoid data corruption & instabilities. • Node cleanup • We detect & hold out problem nodes until staff clean. • All in all, the users have been VERY pleased. [ditto]

  7. Full Machine Job • This system is capable of doing big science

  8. TCS (Terascale Computing System)& ETF • Sponsored by the U.S. National Science Foundation • Serving the “very high end” for US academic computational science and engineering • Designed to be used, as a whole, on single problems. (recall full machine job) • Full range of scientific and engineering applications. • Compaq AlphaServer SC hardware and software technology • In general production since April, 2002 • #6 in Top 500; (largest open facility in the world:Nov 2001) • TCS-1: in general production since April, 2002 • Integrated into the PACI program (Partnerships for Academic Computing Infrastructure) • DTF project to build and integrate multiple systems • NCSA, SDSC, Caltech, Argonne. Multi-lamba, transcontinental interconnect • ETF aka Teratrid (Extensible Terascale Facility) integrating TCS with DTF forming • A heterogeneous, extensible scientific/engineering cyberinfrastructure Grid

  9. Infrastructure: PSC - TCS machine room( @ Westinghouse)(Not require a new building; just a pipe & wire upgrade; not maxed out) • ~8k ft2 • Use ~2.5k • Existingroom. • (16 yrs old.)

  10. CONTROL DISKS SERVERS SWITCH COMPUTE NODES Full System: Physical Structure Floor Layout • Geometrical constraints invariant twixt US & Japan

  11. Terascale Computing System Compute Nodes • 750 ES45 4-CPU servers • +13 inline spares • (+2 login nodes) • 4 - EV68’s /node • 1 GHz = 2.Gf [6 Tf] • 4 GB memory [3.0 TB] • 3*18.2 GB disk [41 TB] • System • User temporary • Fast snapshots • [~90 GB/s] • Tru64 Unix Compute Nodes

  12. ES45 nodes • 5 nodes per cabinet • 3 local disks /node

  13. Terascale Computing System Quadrics Quadrics Network • 2 “rails” • Higher bandwidth • (~250 MB/s/rail) • Lower latency • 2.5 s put latency • 1 NIC/node/rail • Federated switch (/rail) • “Fat-tree” (bbw ~0.2 TB/s) Compute Nodes • User virtual memory mapped • Hardware retry • Heterogeneous • (Alpha Tru64 & Linux, Intel Linux)

  14. Central Switch Assembly • 20 cabinetsin center • Minimize max internode distance • 3 out of 4 rows shown • 21st LL switch, outside (not shown)

  15. Quadrics wiring overhead (view towards ceiling)

  16. Terascale Computing System Quadrics Management & Control Control • Quadrics switch control: • Internal SBC & Ethernet • “Insight Manager” on PC’s • Dedicated systems • Cluster/node monitoring & control • RMS database • Ethernet & • Serial Link LAN Compute Nodes

  17. Interactive Terascale Computing System Quadrics Interactive Nodes Control • Dedicated: 2*ES45 • +8 on compute nodes • Shared function nodes • User access • Gigabit Ethernet to WAN • Quadrics connected • /usr & indexed store (ISMS) LAN Compute Nodes /usr WAN/LAN

  18. Interactive Terascale Computing System Quadrics File Servers Control • 64, on compute nodes • 0.47 TB/server [30 TB] • ~500 MB/s [~32 GB/s] • Temporary user storage • Direct IO • /tmp • [Each server has • 24 disks on • 8 SCSI chains on • 4 controllers • sustain full drive bw.] LAN Compute Nodes File Servers /tmp /usr WAN/LAN

  19. Interactive Terascale Computing System Summary Quadrics • 750+ ES45 Compute Nodes • 3000 EV68 CPU’s @ 1 GHz • 6 Tf • 3. TB memory • 41 TB node disk, ~90GB/s • Multi-rail fat-tree network • Redundant monitor/ctrl • WAN/LAN accessible • File servers: 30TB, ~32 GB/s • Buffer disk store, ~150 TB • Parallel visualization • Mass store, ~1 TB/hr, > 1 PB • ETF coupled (hetero) Control LAN Compute Nodes File Servers /tmp /usr WAN/LAN

  20. Terascale Computing System Visualization • Intel/Linux • Newest software • ~16 nodes • Parallel rendering • HW/SW compositing • Quadrics connected • Image output •  Web pages + TCS 340 GB/s (1520Q) Quadrics 4.5 GB/s (20Q) 3.6 GB/s (16Q) 3.6 GB/s (16Q) ApplicationGateways Viz Buffer Disk WAN coupled

  21. Quadrics coupled (~225 MB/s/link) Intermediate between TCS & HSM Independently managed. Private transport from TCS. Archive disk Terascale Computing System Buffer Disk & HSM TCS 340 GB/s (1520Q) Quadrics 4.5 GB/s (20Q) 3.6 GB/s (16Q) 3.6 GB/s (16Q) >360 MB/s to tape HSM - LSCi ApplicationGateways Viz Buffer Disk WAN/LAN & SDSC

  22. Quadrics coupled (~225 MB/s/link) Coupled to ETF backbone by GigE 30 Gb/s Terascale Computing System Application Gateways TCS 340 GB/s (1520Q) Quadrics 4.5 GB/s (20Q) 3.6 GB/s (16Q) 3.6 GB/s (16Q) ApplicationGateways Viz Buffer Disk Multi GigE to ETF Backbone @ 30 Gb/s

  23. The Front Row • Yes, those are Pittsburgh sports’ colors.

More Related