230 likes | 408 Views
“SKIF-GRID” SUPERCOMPUTING PROJECT OF THE UNION STATE OF RUSSIA AND BELARUS. SHORT OVERVIEW OF CURRENT STATUS. A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow, 10-12 June, 2009. Pereslavl-Zalessky.
E N D
“SKIF-GRID” SUPERCOMPUTING PROJECT OF THE UNION STATE OF RUSSIA AND BELARUS SHORT OVERVIEW OF CURRENT STATUS A. A. Moskovsky Program Systems Institute, Russian Academy of Sciences IKI - MSR Research Workshop Moscow, 10-12 June, 2009
Pereslavl-Zalessky Russian Golden Ring City: 857 years old Hometown of Great Dukes of Russia The first building site Peter The Great navy Ancient capital of Russian Orthodox church Pereslavl Zalessky 120 km Moscow 2
“SKIF-GRID” PROJECT TIMELINE • 2000-2004 - SKIF project, SKIF K-1000 is #98 in Top500 • June 2004 – first proposal filedfor “SKIF-GRID” project • March 2007 – approved by Government • March 2008 - SKIF-MSU supercomputer deployed (#36 in June 08 Top 500) • May 2008 - “SKIF-Testbed” federation created. • March 2009 – alliance agreement signed for SKIF series 4 development
PROJECT ORGANIZATION: 2007-2008 Project directions • Grid technology • Supercomputers • SW • HW • Security • Pilot projects – applications of HPC and grid technology
SKIF MSU • Theoretical peak performance 60 TFlops • 47 TFlops Linpack • Advanced clustering solutions: • diskless computational nodes • Original blade design
«SKIF-Testbed» a/k/a “SKIF-Polygon” • Federation of HPC centers, ~100 Tflops • 4computers in the currentTop 500 • MSU (#35 in Top500) • South Urals State University • Tomsk State University • UFA state technical university
Middleware platform – UNICORE 6.1 • X.509 for security • Certificate Authority at Pereslavl-Zalessky (PyCA) • Site platform • UNICORE 6.1 • Java 1.5 • Linux • Torque • Experimental sites: UNICORE is complemented with additional services/modules
Applications (2007-2008) • HPC applications: • Drug design (MSU Belozersky Institute, SRCC, Chelyabinsk SU) • Inverse problems in soil remote sensing (SRCC) • Computational chemistry (MSU Chemistry department) • Geophysical data services • Mammography database prototype (N.N. Semenov Chemical Physics Institute, RAS) • Text mining (PSI RAS) • Engineering (South Ural University …) • Space Research Institute... • …
2009-2010: second phase of SKIF-GRID project SKIF-Aurora
SKIF Series 4: original R&D goals • Highest density of performance(biggest possible number CPU per 1U) • Smaller latency • Less cables and connectors — better reliability • Enlarged emission of heat per 1U • We need new technology of cooling… How to? • Improved Interconnect: we need better scalability, bandwidth and latency that it’s provided by best available solutions (eg. Infiniband QDR) • New approach to monitoring and management of the supercomputer • Combining standard CPUs and accelerators in computational nodes of the supercomputer
Summer’2008: SKIF Series 4 — Know How! Program SystemsInstitute of RAS • Italian-Russian Cooperation • «SKIF Series 4» ==«SKIF-AURORA Project» • Designed by an alliance of Eurotech, PSI RAS and RSC SKIF with support by Intel • To be present at ISC 09
SKIF-Aurora distinctive features • No moving parts • Liquid cooling – power efficiency • X86_64 processors (IntelNehalem) • 3-D torus interconnect • Redundant management/monitoring subsystem • FPGA on board (optional) • SSD disks (optional) • QDR Infiniband
SKIF-Aurora • 32 nodes per chassis • 64 CPUs in 6U • Up to 8 chassis per rack • Up to 512 CPU per rack • Up to 2048 cores • To build 500 TFlops • 21 racks in 2009 • scalable due to 3-D torus • 10 kW per chassis
SKIF-AURORA: Designed by the alliance of Eurotech, PSI RAS and RSC SKIF 3 level of managementsystem, Interconnect(3D-torus: firmware,routing, drivers, MPI-2…), FPGA as accelerator PCBs, mechanics, power supply, cooling,1 and 2 levels of management system
3-D torus interconnect implementation • Only QCD specific is implemented by Italian team • Russian teams to upgrade network to general-purpose interconnect (MPI 2.0), due to appear fall 2009 System Interconnect, 3D-torus non-standard part ... FPGA FPGA FPGA FPGA CPU CPU CPU CPU standard part Subsidiary Interconnect, Infiniband
R&D Directions Using FPGA • Collective MPI operations using FPGA • FPGA to facilitate support of PGAS-languages (UPC, Titanium, etc) • FPGA+CPU hybrid computing
Conclusions • Is based on collaboration between international teams • Harnesses shared expertise and results • Aimed to develop a family of petascale-level supercomputers with innovative techniques: • Higher density of CPUs (flops per volume) • Efficient water cooling system • Scalable powerful 3D-Torus Interconnect • Etc.
THANKS SKIF-GRID web site http://skif-grid.botik.ru