190 likes | 304 Views
Combining the Power of Computer and Computational Sciences to Fly to Peta-Scale — a Case Study —. Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University. special thanks to: Y. Omura & H. Usui (RISH, Kyoto U.). Contents. Introduction: Combining CS 2 Power
E N D
Combining the Power of Computer and Computational Sciences to Fly to Peta-Scale— a Case Study — Hiroshi Nakashima Academic Center for Computing and Media Studies Kyoto University special thanks to: Y. Omura & H. Usui (RISH, Kyoto U.)
Contents • Introduction: Combining CS2 Power • Why Need to Fly to Peta-Scale? • What Kind of Power to Be Combined? • Case Study: Plasma Simulation on DM Systems • Why Plasma Simulation? • Why for DM Systems ? • How for DM Systems ? • How Efficient ? • Fly from Case Study • Took off Successfully? • How Can We Fly Higher? • Conclusions
Contents • Introduction: Combining CS2 Power • Why Need to Fly to Peta-Scale? • What Kind of Power to Be Combined? • Case Study: Plasma Simulation on DM Systems • Why Plasma Simulation? • Why for DM Systems ? • How for DM Systems ? • How Efficient ? • Fly from Case Study • Took off Successfully? • How Can We Fly Higher? • Conclusions
Why Need to Combine CS2 Power ?Fly to Peta: How High? (1/2) T2K Open Supercomputer in Kyoto Rpeak/Rmax=61.2/50.5TFLOPS (#34) system: node-group x 70 + 288p-sw x 6 + + + ... node group: node x 6 + 24p-sw x 2 node: (socket + mem.bank) x 4 + IB x 4 • already large enough (16 x 416 nodes = 6656 cores) • already layered deeply & complicatedly enough (coresocketnode node-groupsystem) socket: core x 4 + L3 core: (mul+add) x 2 + (L1+L2)
T2K Open Supercomputer in Kyoto Rpeak/Rmax=61.2/50.5TFLOPS (#3?) system: node-group x 70 + 288p-sw x 6 + + + ... node group: node x 6 + 24p-sw x 2 node: (socket + mem.bank) x 4 + IB x 4 • already large enough (16 x 416 nodes = 6656 cores) • already layered deeply & complicatedly enough (coresocketnode node-groupsystem) socket: core x 4 + L3 core: (mul+add) x 2 + (L1+L2) Why Need to Combine CS2 Power ?Fly to Peta: How High? (2/2) Peta-scale system should be; • much larger (1,000,000 cores 6656 x 150) • much more deeply/complicatedly layered (corecore-groupsocketsocket-groupnode node-groupnode-supergroupsystem)
Why Need to Combine CS2 Power ?Fly to Peta: How High? (2/2) Peta-scale system should be; • much larger (1,000,000 cores 6656 x 150) • much more deeply/complicatedly layered (corecore-groupsocketsocket-groupnode node-groupnode-supergroupsystem) BTW, how large is Peta? • 1 Peta meter > 100 light-year • 1 Peta second > 30 million year • 1 Peta kg > 1/2 x Deimos • 1 Peta Hz > violet
Why Need to Combine CS2 Power ?What Are Combined to Fly? Computational scientists have deep knowledge of; • physics, chemistry, biology, ... • their own problems, algorithms, programs, ... • (sometimes) their own supercomputers Computational scientists have deep knowledge of; • physics, chemistry, biology, ... • their own problems, algorithms, programs, ... • (sometimes) their own supercomputers and (often?) have Nature/Science papers more Nature/Science papers and chance to win Nobel Prize much more efficient way to fully exploit peta-scale computing power chance to co-author a Nature/ Science paper and to attend Nobel Prize Ceremony Computer scientists have deep knowledge of; • a wide variety of computers, software, tools, ... • a wide variety of algorithms, techniques, tricks, ... • (sometimes) a few of scientific problems Computer scientists have deep knowledge of; • a wide variety of computers, software, tools, ... • a wide variety of algorithms, techniques, tricks, ... • (sometimes) a few of scientific problems but never dream to author a Nature/Science paper
Contents • Introduction: Combining CS2 Power • Why Need to Fly to Peta-Scale? • What Kind of Power to Be Combined? • Case Study: Plasma Simulation on DM Systems • Why Plasma Simulation? • Why for DM Systems ? • How for DM Systems ? • How Efficient ? • Fly from Case Study • Took off Successfully? • How Can We Fly Higher? • Conclusions
Case Study: Plasma Simulation on DMWhy Plasma Simulation ? A big user group of plasma simulation insisted that our new system should include this power/money hungry subsystem for their memory hungry SM-parallel application. power/money hungry large scale (128cores, 1TB, 1.28TFlops) shared memory nodes I failed to persuade them to build Open-Supercomputer-only system. So I swore revenge on them by coding a much more efficient DM-parallel program to run on Open Supercomputer.
simulate particle movement by Case Study: Plasma Simulation on DMWhy for DM Systems ? (1/2) a large number of (e.g. > 1 billion) charged particles a large scale (e.g. 2000x2000x2000 grid) electromagnetic field (e.g. magnetosphere)
Case Study: Plasma Simulation on DMWhy for DM Systems ? (2/2) • particle parallelization (only) very simple esp. on SM #particle memory short in SM #grid-point memory short even in DM
Case Study: Plasma Simulation on DMHow for DM Systems ? (1/3) 03 13 23 33 03 13 23 33 02 12 22 32 02 00 10 12 11 20 31 22 02 30 33 32 03 01 11 21 31 01 11 21 31 21 01 23 13 32 OhHelp: One-handedHelp 00 10 20 30 00 10 20 30 22 primary subspaces secondary subspaces • uniform block decomposition • well-balanced: #particle-in-subspace #p / #nodes (1 + ) simulate primary particles neighboring comm. only • each node helps another node having dense subspace • balanced #particles • balanced subspace size • simple boundary comp/comm • well-balancedstable ss ass.
Case Study: Plasma Simulation on DMHow for DM Systems ? (2/3) • Secondary Space Assignment give p even if becoming less than average get from somebody afterward move p from heaviest to lightest so that lightest has av. #p av. #p 33 00 32 01 30 10 13 03 23 20 31 02 11 21 12 22
33 00 32 01 30 10 13 03 23 20 31 02 11 21 12 22 Case Study: Plasma Simulation on DMHow for DM Systems ? (3/3) • Well-Balancing Check with Primary/Secondary Tree • check recursively from leaves to root • OK if no overflow detected • must have all primaries not covered by children • cover secondaries up to well-balancing limit • must have all primaries • cover secondaries up to well-balancing limit
Case Study: Plasma Simulation on DMHow Efficient ? • performance @ 16-128 proc on HPC2500 x11.71 balanced T2K Open Supercomputer 4 nodes (64 cores) x4.02 x8.76 unbalanced x10.7 x1.66 original x3.20
Contents • Introduction: Combining CS2 Power • Why Need to Fly to Peta-Scale? • What Kind of Power to Be Combined? • Case Study: Plasma Simulation on DM Systems • Why Plasma Simulation? • Why for DM Systems ? • How for DM Systems ? • How Efficient ? • Fly from Case Study • Took off Successfully? • How Can We Fly Higher? • Conclusions
Fly from Case StudyTook off Successfully ? Plasma simulation group now; • appreciates OhHelp and Open Supercomputer (but not published Nature/Science papers yet ) • is planning to port codes to Open Supercomputer. Plasma simulation group now; • appreciates OhHelp and Open Supercomputer (but not published Nature/Science papers yet ) • is planning to port codes to Open Supercomputer. • hopes our help in recoding a variety of simulators. We supercomputer guys now; • are happy with accomplishing the revenge. • are generously pursuing cooperative research with them (hoping at least to have a SC paper ) We supercomputer guys now; • are happy with accomplishing the revenge. • are generously pursuing cooperative research with them (hoping at least to have a SC paper ) • but cannot find time to do everything they want.
Fly from Case StudyHow Fly Higher ? • Plasma guys have a large variety of simulators. • Plasma guys have a wide variety of simulators. • Other guys have other varieties of other simulators. Parallelization Method Library generated from • method skeleton • AP specific stub and linked to simulators. • We supercomputer guys have OhHelp which needs to be adapted to each simulator by modifying not only itself but also the simulator. • We supercomputer guys have OhHelp which needs to be adapted to each simulator by modifying not only itself but also the simulator. • Expectedly we will find other computer-scientific tricks for other types of simulators.
Conclusions • Flying to Peta-scale needs CS2 collaboration • offering various (non-numerical) tricks from computer guys. • taking opportunity to play in larger and real-world application field from computational guys. • Took off from OhHelp • simple but efficient load balancing for plasma simulations. • (non-numerical) computer-scientific tricks can greatly improve numerical simulations. • fly higher by parallelization method libraries. • Other ways to elevate • adaptation of linear equation solvers to applications w.r.t. memory layout. • parallel script programming language for large parameter space exploration.