230 likes | 357 Views
SOS7: “Machines Already Operational” NSF’s Terascale Computing System. SOS-7 March 4-6, 2003 Mike Levine, PSC. Outline. Overview of TCS, the US-NSF’s Terascale Computing System. Answering 3 questions: Is your machine living up to performance expectations? … What is the MTBI? …
E N D
SOS7: “Machines Already Operational”NSF’s Terascale Computing System SOS-7 March 4-6, 2003 Mike Levine, PSC
Outline • Overview of TCS, the US-NSF’s Terascale Computing System. • Answering 3 questions: • Is your machine living up to performance expectations? … • What is the MTBI? … • What is the primary complaint, if any, from users? • [See also PSC web pages & Rolf’s info.]
Q1: Performance • Computational and communications performance is very good! • Alpha processors & ES45 servers: very good • Quadrics bw & latency: very good. • ~74% of peak on Linpack; >76% on LSMS • More work on disk IO. • This has been a very ease “port” for most users. • Easier than some Cray Cray upgrades.
Q2: MTBI (Monthly Average) • Compare with theoretical prediction of 12 hrs. • Expect further improvement (fixing systematic problems).
Time Lost to Unscheduled Events • Purple: nodes requiring cleanup • Worst case is ~3%
Q3: Complaints • #1: “I need more time” (not a complaint about performance) • Actual usage >80% of wall clock • Some structural improvements still in progress. • Not a whole lot more is possible! • Work needed on • Rogue OS activity. [recall Prof. Kale’s comment] • MPI & global reduction libraries. [ditto] • System debugging and fragility. • IO performance. • We have delayed full disk deployment to avoid data corruption & instabilities. • Node cleanup • We detect & hold out problem nodes until staff clean. • All in all, the users have been VERY pleased. [ditto]
Full Machine Job • This system is capable of doing big science
TCS (Terascale Computing System)& ETF • Sponsored by the U.S. National Science Foundation • Serving the “very high end” for US academic computational science and engineering • Designed to be used, as a whole, on single problems. (recall full machine job) • Full range of scientific and engineering applications. • Compaq AlphaServer SC hardware and software technology • In general production since April, 2002 • #6 in Top 500; (largest open facility in the world:Nov 2001) • TCS-1: in general production since April, 2002 • Integrated into the PACI program (Partnerships for Academic Computing Infrastructure) • DTF project to build and integrate multiple systems • NCSA, SDSC, Caltech, Argonne. Multi-lamba, transcontinental interconnect • ETF aka Teratrid (Extensible Terascale Facility) integrating TCS with DTF forming • A heterogeneous, extensible scientific/engineering cyberinfrastructure Grid
Infrastructure: PSC - TCS machine room( @ Westinghouse)(Not require a new building; just a pipe & wire upgrade; not maxed out) • ~8k ft2 • Use ~2.5k • Existingroom. • (16 yrs old.)
CONTROL DISKS SERVERS SWITCH COMPUTE NODES Full System: Physical Structure Floor Layout • Geometrical constraints invariant twixt US & Japan
Terascale Computing System Compute Nodes • 750 ES45 4-CPU servers • +13 inline spares • (+2 login nodes) • 4 - EV68’s /node • 1 GHz = 2.Gf [6 Tf] • 4 GB memory [3.0 TB] • 3*18.2 GB disk [41 TB] • System • User temporary • Fast snapshots • [~90 GB/s] • Tru64 Unix Compute Nodes
ES45 nodes • 5 nodes per cabinet • 3 local disks /node
Terascale Computing System Quadrics Quadrics Network • 2 “rails” • Higher bandwidth • (~250 MB/s/rail) • Lower latency • 2.5 s put latency • 1 NIC/node/rail • Federated switch (/rail) • “Fat-tree” (bbw ~0.2 TB/s) Compute Nodes • User virtual memory mapped • Hardware retry • Heterogeneous • (Alpha Tru64 & Linux, Intel Linux)
Central Switch Assembly • 20 cabinetsin center • Minimize max internode distance • 3 out of 4 rows shown • 21st LL switch, outside (not shown)
Terascale Computing System Quadrics Management & Control Control • Quadrics switch control: • Internal SBC & Ethernet • “Insight Manager” on PC’s • Dedicated systems • Cluster/node monitoring & control • RMS database • Ethernet & • Serial Link LAN Compute Nodes
Interactive Terascale Computing System Quadrics Interactive Nodes Control • Dedicated: 2*ES45 • +8 on compute nodes • Shared function nodes • User access • Gigabit Ethernet to WAN • Quadrics connected • /usr & indexed store (ISMS) LAN Compute Nodes /usr WAN/LAN
Interactive Terascale Computing System Quadrics File Servers Control • 64, on compute nodes • 0.47 TB/server [30 TB] • ~500 MB/s [~32 GB/s] • Temporary user storage • Direct IO • /tmp • [Each server has • 24 disks on • 8 SCSI chains on • 4 controllers • sustain full drive bw.] LAN Compute Nodes File Servers /tmp /usr WAN/LAN
Interactive Terascale Computing System Summary Quadrics • 750+ ES45 Compute Nodes • 3000 EV68 CPU’s @ 1 GHz • 6 Tf • 3. TB memory • 41 TB node disk, ~90GB/s • Multi-rail fat-tree network • Redundant monitor/ctrl • WAN/LAN accessible • File servers: 30TB, ~32 GB/s • Buffer disk store, ~150 TB • Parallel visualization • Mass store, ~1 TB/hr, > 1 PB • ETF coupled (hetero) Control LAN Compute Nodes File Servers /tmp /usr WAN/LAN
Terascale Computing System Visualization • Intel/Linux • Newest software • ~16 nodes • Parallel rendering • HW/SW compositing • Quadrics connected • Image output • Web pages + TCS 340 GB/s (1520Q) Quadrics 4.5 GB/s (20Q) 3.6 GB/s (16Q) 3.6 GB/s (16Q) ApplicationGateways Viz Buffer Disk WAN coupled
Quadrics coupled (~225 MB/s/link) Intermediate between TCS & HSM Independently managed. Private transport from TCS. Archive disk Terascale Computing System Buffer Disk & HSM TCS 340 GB/s (1520Q) Quadrics 4.5 GB/s (20Q) 3.6 GB/s (16Q) 3.6 GB/s (16Q) >360 MB/s to tape HSM - LSCi ApplicationGateways Viz Buffer Disk WAN/LAN & SDSC
Quadrics coupled (~225 MB/s/link) Coupled to ETF backbone by GigE 30 Gb/s Terascale Computing System Application Gateways TCS 340 GB/s (1520Q) Quadrics 4.5 GB/s (20Q) 3.6 GB/s (16Q) 3.6 GB/s (16Q) ApplicationGateways Viz Buffer Disk Multi GigE to ETF Backbone @ 30 Gb/s
The Front Row • Yes, those are Pittsburgh sports’ colors.