250 likes | 410 Views
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation. Ning Liu, Christopher Carothers liun2@cs.rpi.edu. Outline. Backgound Torus model, traffic model BG/L Ross: Massively Parallel Simulator Experiment results Future work. Background.
E N D
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers liun2@cs.rpi.edu
Outline • Backgound • Torus model, traffic model • BG/L • Ross: Massively Parallel Simulator • Experiment results • Future work
Background • CODES: Enabling Co-Design of Multilayer Exascale Storage Architectures • CODES GOAL: Develop a simulation framework for evaluating exascale storage design challenges. • Hardware Models • Storage Software Models • Storage System Architecture • Exascale I/O Workload Models • Simulation Framework - Integrate models and storage software into simulation framework
Torus Network • Blue Gene and Cray XT supercomputer families adopt a 3-D torus • Upcoming Blue Gene/Q will have a 5-D torus network • Provide low latencies and high bandwidth at a moderate cost to construct.
Torus Traffic and Routing • Using Markovian models • Each node continuously generates packets • Select random destination • Packet size fixed • Dynamic routing VS. static routing • Avoid deadlocks • BGL eager/rendezvous protocols
Discrete Event Model • Logic Process: Node • Events • Packet_generate_event • Packet_send_event • Packet_arrival_event • Packet_process_event
Simulation Testbed: BGL • 32-bit IBM PowerPCs running at only 700 MHz • 1 GB memory per node • 1,024 dual processor “node” per rack • 16-rack, 32,768-processor located at Rensselaer’s Computational Center for Nanotechnology Innovations (CCNI) • Confusion? Simulating BGL torus on top of BGL
ROSS: Parallel Simulator • Serial/Conservative/Optimistic Simulation • Using Jefferson’s Time Warp event scheduling mechanism • Reverse Computation
Validation Using Little’s Law Little's Law: the average number of customers in the store, L, is the effective arrival rate, λ, times the average time that a customer spends in the store
Latency Comparison: BGL vs. Simulation • Using MPI Send()/MPI Recv() • Collected data from 1,024-node torus in a 1x32x32 node configuration
Performance Metrics • The performance study examines the impact of processor/core count on four primary metrics: • (i) committed event-rate, • (ii) percentage of remote events, • (iii) efficiency and • (iv) secondary rollbacks.
Million-Node Torus Scalability • Packet injection rate 10 pkt/ms • peak event-rate of 4.78 G/sec
Remote Event Rate • Random destination selection creates a difficult scenario for parallel event scheduling because each packet randomly selects destination
Billion-Node Torus Scalability • consume 2 TB memory • total number of generated packets is O(1011) • total number of events scheduled is O(1013) • Packet injection rate 200 pkt/ns & 400 pkt/ns • higher rollback probability • larger event population leads to increased queuing overheads
Future work • Application workload models: Application I/O kernel models, I/O characterization models • I/O aggregator node models • I/O network models: network cards, switches, and topologies • I/O storage node models: storage software • I/O storage software: models and prototype system software • I/O controller models: RAID and enterprise storage devices • Disk models: HDDs and SSDs
Future work • Increase the fidelity of torus network model • Dynamic routing • Virtual channels • Different torus traffic model • Tree network model based on Blue Gene families • MPI_Alltoall(), MPI_Bcast(),MPI_Reduce(); • Complex I/O workload drivers, like PHASTA
Related Work • Heidelberger’s use of the YAWNS protocol to model the Blue Gene/L torus network on a per cycle basis appears to be one of the most accurate models created to date. • Min and Ould Khaoua proposed a torus network model based on circuit switching.
Conslusions • near linear speedup for our torus model • peak event-rate on 32K cores is 4.78 G/sec • demonstrated the ability to model a million-node and billion-node torus network on Blue Gene/L • conducted comparison tests between actual Blue Gene torus network and our model using MPI Send()/MPI Recv()
Thank you for your attention! Questions?