120 likes | 228 Views
SDSC RP Update TeraGrid Roundtable 01-14-10. Reviewing Dash. Unique characteristics: A pre-production/evaluation “data-intensive” supercomputer based on SSD flash memory and virtual shared memory Nehalem processors Integrating into TeraGrid: Add to TeraGrid Resource Catalog
E N D
Reviewing Dash • Unique characteristics: • A pre-production/evaluation “data-intensive” supercomputer based on SSD flash memory and virtual shared memory • Nehalem processors • Integrating into TeraGrid: • Add to TeraGrid Resource Catalog • Target friendly users interested in exploring unique capabilities • Available initially for start-up allocations (March 2010) • As it stabilizes and depending on user interest, evaluate more routine allocations at TRAC level • Appropriate CTSS kits will be installed • Planned to support TeraGrid wide-area filesystem efforts (GPFS-WAN, Lustre-WAN)
Introducing Gordon(SDSC’s Track 2d System) • Unique characteristics: • A “data-intensive” supercomputer based on SSD flash memory and virtual shared memory • Emphasizes MEM and IO over FLOPS • A system designed to accelerate access to massive data bases being generated in all fields of science, engineering, medicine, and social science • Sandy Bridge processors • Integrating into TeraGrid: • Will be added to TeraGrid Resource Catalog • Appropriate CTSS kits will be installed • Planned to support TeraGrid wide-area filesystem efforts • Coming summer 2011
The Memory Hierarchy Potential 10x speedup for random I/O to large files and databases Flash SSD, O(TB) 1000 cycles
Gordon Architecture: “Supernode” • 32 Appro Extreme-X compute nodes • Dual processor Intel Sandy Bridge • 240 GFLOPS • 64 GB • 2 Appro Extreme-X IO nodes • Intel SSD drives • 4 TB ea. • 560,000 IOPS • ScaleMPvSMP virtual shared memory • 2 TB RAM aggregate • 8 TB SSD aggregate 4 TB SSD I/O Node 240 GF Comp. Node 64 GB RAM 240 GF Comp. Node 64 GB RAM vSMP memory virtualization
Gordon Architecture: Full Machine • 32 supernodes = 1024 compute nodes • Dual rail QDR Infiniband network • 3D torus (4x4x4) • 4 PB rotating disk parallel file system • >100 GB/s SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN D D D D D D
Comparing Dash and Gordon systems Doubling capacity halves accessibility to any random data on a given media
Data mining applicationswill benefit from Gordon • De novo genome assembly from sequencer reads & analysis of galaxies from cosmological simulations and observations • Will benefit from large shared memory • Federations of databases and Interaction network analysis for drug discovery, social science, biology, epidemiology, etc. • Will benefit from low latency I/O from flash
Data-intensive predictive sciencewill benefit from Gordon • Solution of inverse problems in oceanography, atmospheric science, & seismology • Will benefit from a balanced system, especially large RAM per core & fast I/O • Modestly scalable codes in quantum chemistry & structural engineering • Will benefit from large shared memory
We won SC09 Data Challenge with Dash! • With these numbers: • IOR 4KB • RAMFS 4Million+ IOPS on up to .750 TB of DRAM (1 supernode’s worth) • 88K+ IOPS on up to 1 TB of flash (1 supernode’s worth) • Speed up Palomar Transients database searches 10x to 100x • Best IOPS per dollar • Since that time we boosted flash IOPS to 540K hitting our 2011 performance targets
Deployment Schedule • Summer 2009-Present • Internal evaluation and testing w/ internal apps – SSD and vSMP • Starting ~Mar 2010 • Dash would be allocated via startup requests by friendly TeraGrid users. • Summer 2010 • Expect to change status to allocable system starting ~October 2010 via TRAC requests • Preference given to applications that target the unique technologies of Dash. • Oct 2010 - June 2011 • Operate Dash as an allocable TeraGrid resource, available thru the normal POPS/TRAC cycles, with appropriate caveats about preferred applications and friendly-user status. • Help fill the SMP gap created by Altix’s being retired in 2010 • March 2011 – July 2011 • Gordon build and acceptance • July 2011 – June 2014 • Operate Gordon as an allocable TeraGrid resource, available thru the normal POPS/TRAC cycles
Consolidating Archive Systems • SDSC has historically operated two archive systems: HPSS and SAM-QFS • Due to budget constraints, we’re consolidating to one: SAM-QFS • We’re currently migrating HPSS user data to SAM-QFS HPSS (R/W) HPSS (R) SAMQFS (R/W) SAMQFS Legacy: (R) Allocated: (R/W) SAMQFS Legacy: (R) Allocated: (R/W) SAMQFS (R) TBD Hardware 6 Silos 12 PB 64 Tape Drives No Change Hardware 2 Silos 6 PB 32 Tape Drives No Change Jul 2009 Mid 2010 Mar 2011 Jun 2013 …