560 likes | 577 Views
RESAM Laboratory Univ. Lyon 1, France. lead by Prof. B. Tourancheau Laurent Lefèvre CongDuc Pham Pascale Primet PhD. student Patrick Geoffray Roland Westrelin. Research interests. High-performance communication systems Myrinet-based clusters, cluster management BIP, MPI-BIP, BIP-SMP
E N D
RESAM LaboratoryUniv. Lyon 1, France lead by Prof. B. Tourancheau Laurent Lefèvre CongDuc Pham Pascale Primet PhD. student Patrick Geoffray Roland Westrelin
Research interests • High-performance communication systems • Myrinet-based clusters, cluster management • BIP, MPI-BIP, BIP-SMP • Distributed Shared Memory systems • DOSMOS system • Network support for Multimedia and Cooperative applications • QoS, multicast • CoTool environment • Parallel simulation, synchronization algorithms, communication network models • CSAM tools
Parallel and Distributed Simulation of Communication Networks (towards cluster-based solution) C.D. Pham RESAM laboratory Univ. Lyon 1, France cpham@lhpca.univ-lyon1.fr
Outline • Introduction • Discrete Event Simulation (DES) • Parallel DES and the synchronization problems • Conservative protocols • Architecture of a conservative LP • The Chandy-Misra-Bryant protocol • The lookahead ability • Optimistic protocols • Architecture of an optimistic LP • Time Warp
Outline, more... • CSAM, a tools for ATM network models • kernel characteristics • results • Cluster-based solutions • Myrinet, BIP, BIP-SMP, MPI/BIP, MPI/BIP-SMP • Fast Ethernet, Gamma? • GigaEthernet?
Introduction Discrete Event Simulation (DES) Parallel DES and synchronization problems
S2 Discrete Event Simulation (DES) • assumption that a system changes its state at discrete points in simulation time a1 a2 d1 a3 d2 d3 a4 S1 S3 NOT... t 0 2t 3t 4t 5t 6t
DES concepts • fundamental concepts: • system state (variables) • state transitions (events) • simulation time: totally ordered set of values representing time in the system being modeled • the system state can only be modified upon reception of an event • modeling can be • event-oriented • process-oriented
Life cycle of a DES • a DES system can be viewed as a collec-tion of simulated objects and a sequence of event computations • each event computation contains a time stamp indicating when that event occurs in the physical system • each event computation may: • modify state variables • schedule new events into the simulated future • events are stored in a local event list • events are processed in time stamped order • usually, no more event = termination
A B 5 <e1,5> A receive packet P1 e1 <e2,10> A sends P1 to B e2 <e3,12> A receive packet P2 e3 <e4,15> B receive P1 from A e4 e5 <e5,16> B sends ACK(P1) to A <e6,17> A sends P2 to B e6 <e7,21> A receive ACK(P1) e7 e8 <e8,23> B receive P2 from A <e9,22> A receive packet P3 e9 local event list A simple DES model link model delay = 5 send processing time = 5 receive processing time = 1 packet arrival P1 at 5, P2 at 12, P3 at 22
Why it works? • events are processed in time stamp order • an event at time tcan only generate future events with timestamp greater or equal tot (no event in the past) • generated events are put and sorted in the event list, according to their timestamp • the event with the smallest timestamp is always processed first, • causality constraints are implicitly maintained.
Why change? It ’s so simple! • models becomes larger and larger • the simulation time is overwhelming or the simulation is just untractable • example: • parallel programs with millions of lines of codes, • mobile networks with millions of mobile hosts, • ATM networks with hundreds of complex switches, • multicast model with thousands of sources, • ever-growing Internet, • and much more...
Some figures to convince... • ATM network models • Simulation at the cell-level, • 200 switches • 1000 traffic sources, 50Mbits/s • 155Mbits/s links, • 1 simulation event per cell arrival. More than 26 billions events to simulate 1 second! 30 hours if 1 event is processed in 1us • simulation time increases as link speed increases, • usually more than 1 event per cell arrival, • how scalable is traditional simulation?
Parallel simulation - principles • execution of a discrete event simulation on a parallel or distributed system with several physical processors. • the simulation model is decomposed into several sub-models that can be executed in parallel • spacial partitioning, • temporel partitioning, • radically different from simple simulation replications.
Parallel simulation - pros & cons • pros • reduction of the simulation time, • increase of the model size, • cons • causality constraints are difficult to maintain, • need of special mechanisms to synchronize the different processors, • increase both the model and the simulation kernel complexity. • challenges • ease of use, transparency.
logical process (LP) h packet t event parallel Parallel simulation - example
A B link model delay = 5 send processing time = 5 receive processing time = 1 packet arrival P1 at 5, P2 at 12, P3 at 22 5 t <e1,5> A rec. packet P1 e1 <e2,10> A sends P1 to B e2 <e3,12> A rec. packet P2 e3 <e4,15> B rec. P1 from A <e6,17> A sends P2 to B e6 e4 <e9,22> A rec. packet P3 e9 e5 <e5,16> B sends ACK(P1) <e3,21> A rec. ACK(P1) e7 e8 <e8,23> B rec. P2 from A causality error, violation A simple PDES model local event list
Synchronization problems • fundamental concepts • each Logical Process (LP) can be at a different simulation time • local causality constraints: events in each LP must be executed in time stamp order • synchronization algorithms • Conservative: avoids local causality violations by waiting until it ’s safe • Optimistic: allows local causality violations but provisions are done to recover from them at runtime
Conservative protocols Architecture of a conservative LP The Chandy-Misra-Bryant protocol The lookahead ability
c1=tB1 tB2 tB1 LPB LPA c2=tC3 tC5 tC4 tC3 LPC LPD tD4 c3=tD3 Architecture of a conservative LP • LPs communicate by sending non-decreasing timestamped messages • each LP keeps a static FIFO channel for each LP with incoming communication • each FIFO channel (input channel, IC) has a clock ci that ticks according to the timestamp of the topmost message, if any, otherwise it keeps the timestamp of the last message
A simple conservative algorithm • each LP has to process event in time-stamp order to avoids local causality violations The Chandy-Misra-Bryant algorithm while (simulation is not over) { determine the ICi with the smallest Ci if (ICi empty) wait for a message else { remove topmost event from ICi process event } }
event min IC 2 1 1 3 4 2 5 3 3 BLOCK 6 1 7 2 Safe but has to block LPB LPA LPC LPD IC1 10 6 3 IC2 7 4 1 5 IC3 9
4 4 4 4 4 6 S sends all messages to B cycle Blocks and even deadlocks! A merge point S M BLOCKED B
4 4 1 4 4 10 5 2 6 7 How to solve deadlock: null-messages null-messages for artificial propagation of simulation time A S 10 10 10 M UNBLOCKED B What frequency?
4 12 8 12 LP C sends a null-message with time stamp 4 LP A sends a null-message with time stamp 8 LP B sends a null-message with time stamp 12 LP C can process event with time stamp 7 How to solve deadlock: null-messages a null-message indicates a Lower Bound Time Stamp minimum delay between links is 4 LP C initially at simulation time 0 11 9 7 A B 10 C
The lookahead ability • null-messages are sent by an LP to indicate a lower bound time stamp on the future messages that will be sent • null-messages rely on the « lookahead » ability • communication link delays • server processing time (FIFO) • lookahead is very application model dependant and need to be explicitly identified
s s s s s s Lookahead for concurrent processing LPA s LPB LPC LPD TA TA+LA safe event s unsafe event
1 5 7 2 3 6 7 LP C sends a null-message with time stamp 1 then 5 LP A sends a null-message with time stamp 2 then 6 LP B sends a null-message with time stamp 3 then 7 LP C can process event with time stamp 7 What if lookahead is small? a null-message indicates a Lower Bound Time Stamp minimum delay between links is 4 LP C initially at simulation time 0 1 11 9 7 A B 10 C
Conservative: pros & cons • pros • simple, easy to implement • good performance when lookahead is large (communication networks, FIFO queue) • cons • pessimistic in many cases • large lookahead is essential for performance • no transparent exploitation of parallelism • performances may drop even with small changes in the model (adding preemption, adding one small lookahead link…)
Optimistic protocols Architecture of an optimistic LP Time Warp
LPB LPA tB2 tC4 tC5 tD4 tB1 tC3 LPC LPD Architecture of an optimistic LP • LPs send timestamped messages, not necessarily in non-decreasing time stamp order • no static communication channels between LPs, dynamic creation of LPs is easy • each LP processes events as they are received, no need to wait for safe events • local causality violations are detected and corrected at runtime
processed! LPD LPB LPC LPC LPB LPD LPB LPD 32 36 28 25 22 18 13 11 Processing events as they arrive LPA LPB what to do with late messages? LPC LPD LPA
TimeWarp. Rollback? How? • Late messages (stragglers) are handled with a rollback mechanism • undo false/uncorrect local computations, • state saving: save the state variables of an LP • reverse computation • undo false/uncorrect remote computations, • anti-messages: anti-messages and (real) messages annihilate each other • process late messages • re-process previous messages: processed events are NOT discarded!
32 45 43 36 32 28 25 22 18 13 11 32 28 25 22 18 13 11 state points anti-msg 38 30 27 24 20 15 13 A pictured-view of a rollback unprocessed • The real rollback distance depends on the state saving period: short period reduces rollback overhead but increases state saving overhead processed 45 43 36 28 25 22 18 13 11 36 32 28 36 38 34 30
45 43 36 28 25 22 25 rollback 45 43 36 28 25 22 45 43 36 28 25 22 Reception of an anti-message • may initiate a rollback if the corresponding positive message has already been processed, • may annihilate the corresponding positive message if it is still unprocessed, 43 • may wait in the input queue if the corresponding positive message has not been received yet. 48 48
Need for a Global Virtual Time • Motivations • an indicator that the simulation time advances • reclaim memory (fossil collection) • Basically, GVT is the minimum of • all LPs ’ logical simulation time • timestamp of messages in transit • GVT garantees that • events below GVT are definitive events (I/O) • no rollback can occur before the GVT • state points before GVT can be reclaimed • anti-messages before GVT can be reclaimed
c D c D c D c D c D D D c c c D c D A pictured-view of the GVT LPA c c c c c LPB WANTED c c c c LPC c c c c c c LPD c c c c c old GVT new GVT conditional event c definitive event D
Optimistic overheads • Periodic state savings • states may be large, very large! • copies are very costly • Periodic GVT computations • costly in a distributed architecture, • may block computations, • Rollback thrashing • cascaded rollback, no advancement! • Memory! • memory is THE limitation
Optimistic: pros & cons • pros • exploits all the parallelism in the model, lookahead is less important, • transparent to the end-user • interactive simulations can be enabled • can be hopefully general-purpose • cons • very complex, needs lots of memory, • large overheads (state saving, GVT, rollbacks…)
Optimizations, variations Conservative Optimistic Mixed approaches, adaptive approaches,
Conservative: outline • Add more information to reduce the number of null-messages • special msg: carrier null-messages [Cai90] • topology information: [DeVries90] • time/delay information: Bounded Lag [Lubuchewsky89] • time window: CTW [Ayani92] • In general, one tries to add additional knowledge of the model in the simulator, may not be general-purpose
Optimistic: outline • Reduce rollback-related overhead • lazy-cancellation, lazy re-evaluation • limit optimism (time window, blocking) • Reduce memory comsumtion • fast GVT algorithms, hardware support • incremental state saving, reverse computation • cancelback, artificial rollback • In general, one tries to reduce the optimism to avoid to many computation speculations
performance mixed conservative conservative optimistic optimistic Mixed/adaptive approaches • General framework that (automatically) switches to conservative or optimistic • Adaptive approaches may determine at runtime the amount of conservatism or optimism messages
Parallel simulation today • Lots of algorithms have been proposed • variations on conservative and optimistic • adaptives approaches • Few end-users • impossible to compete with sequential simulators in terms of user interface, generability, ease of use etc. • Research mainly focus on • applications, ultra-large scale simulations • tools and execution environments (clusters) • composability issues
CSAM (Pham, UCBL) • CSAM: Conservative Simulator for ATM network Model • Simulation at the cell-level • Conservative and/or sequential • C++ programming-style, predefined generic model of sources, switches, links… • New models can be easily created by deriving from base classes • Configuration file that describes the topology
CSAM - Kernel characteristics • Exploits the lookahead of communication links: transparent for the user • Virtual Input Channels • reduces overhead for event manipulation, • reduces overhead for null-messages handling. • Cyclic event execution • Message aggregation • static aggregation size, • asymmetric aggregation size on CLUMPS, • sender-initiated, • receiver-initiated.
Test case: 78-switch ATM network Distance-Vector Routing with dynamic link cost functions Connection setup, admission control protocols
CSAM - Some results... Routing protocol’s reconfiguration time
CSAM - Some results... End-to-end delays