Towards Adaptive Caching for Parallel and Distributed Simulation

Towards Adaptive Caching forParallel and Distributed Simulation Abhishek Chugh & Maria Hybinette Computer Science Department The University of Georgia WSC-2004

Simulation Model Assumptions • Collection of Logical Processes (LPs) • Assume LPs do not share state variables • Communicate by exchanging time stamped messages Airspace LP LP LP LP LP Atlanta Munich

Problem & Goal • Problem: Inefficiency in PDES: Redundant computations • Observation: Computations repeat: • Long run of simulations • Cyclic Systems • Communication network simulations • Goal: Increase efficiency by reusing computations

Cache Approach • Cache computations and re-use when they repeat instead of re-compute. Msg Msg Msg Msg Msg Msg Msg Msg Msg LP LP LP LP Msg LP LP Msg Msg Msg LP LP Msg

Cache Msg LP LP Msg Msg LP LP Msg Approach: Adaptive Caching • Cache computations and re-use when they repeat instead of re-compute. • Generic caching mechanism independent of simulation engine and application • Caveat: Different factors that impact the effectiveness of caching • Proposal: An adaptive approach

Factors Affecting Caching Effectiveness • Cache size • Cost of looking up into the cache and updating cache • Execution time of the computation • Probability of a hit: Hit rate

Effective Caching Cost E(Costuse_cache)= hit_rate * Costlookup_hit + (1 - hit_rate) * (Costlookup_miss + Costcomputation+ Costinsert)

Caching is Not Always a Good Idea E(Costuse_cache)= hit_rate * Costlookup_hit + (1 - hit_rate) * (Costlookup_miss + Costcomputation+ Costinsert) • Hit rate low, or • Very fast computation • Only when Costuse_cache < Costcomputation is caching worthwhile

How Much Speedup is Possible? Neglecting cache warm up and fixed costs Expected Speedup = Costcomputation / Costuse_cache Upper bound (hit_rate = 1) = Costcomputation / Costlookup In our experiments Costcomputation / Costlookup = ~3.5

Related Work • Function Caching: Replace application level function calls with cache queries: • Introduced by: Bellman (1957); Michie (1968) • Incremental computations: • Pugh & Teitelbaum (1989), Liu & Teitelbaum (1995) • Sequential discrete event simulation: • Staged Simulation: Walsh & Sirer (2003) function caching + currying (break up computations), re-ordering and pre-computations), • Decision Tool Techniques for PADS: Multiple runs of similar simulations • Simulation Cloning: Hybinette & Fujimoto (1998); Chen & Turner, et al (2002); Straburger (2000) • Updateable Simulations (Ferenci et al 2002) • Related Optimization Techniques • Lazy Re-Evaluation: West (1988)

Overview of Adaptive Caching Execution time: • Warm-up execution phase, for each function: • Monitor: hit rate, query time, function run time • Determine utility of using cache • Main execution phase, for each function: • Use cache (or not) depending on results from 1 • Randomly sample: hit rate, query time, function run time • Revise decision if conditions change

What’s New • Decision to use cache is made dynamically • in response to unpredictable local conditions for each LP at execution time • Relieves user of having to know whether something is worth caching • adaptive method will automatically identify caching opportunities, reject poor caching choices • Easy to use caching API • independent of application or simulation kernel • cache middleware • Distributed cache • Each LP maintains own independent cache

// ORIGINAL LP CODE LP_init() { cacheInitialize(int argc, char** argv); } Pseudo-Code Example

// ORIGINAL LP CODE LP_init() { cacheInitialize(int argc, char** argv); } Proc(state, msg, MyPE) { retval = cacheCheckStart( currentstate, event ); if( retval == NULL ) { /* original LP code. compute new state and events to be scheduled */ /* allow cache to save results */ cacheCheckEnd( newstate, newevents ) ; } else { newstate = retval.state; newevents = retval.events; } schedule( newevents ); } Pseudo-Code Example

Implementation

Caching Middleware Simulation Application Cache Middleware Simulation Kernel

Caching Middleware (Hit) Simulation Application Cache Middleware Check cache state/message Cache Hit Simulation Kernel

Caching Middleware (Miss) Simulation Application Miss or cache lookup expensive Miss: Cache new state & message Cache Middleware Check cache state/message Cache Miss Simulation Kernel

Cache Implementation • Hash table and separate chaining • Input: Current State & Message • Output: State and output message(s) • Hash function (djb2 by Dan Bernstein, Perl)

Memory Management • Distributed cache; one for each LP • Pre-allocate memory pool for cache in each LP during initialization phase • Upper limit parameterized

Experiments • 3 Sets of Experiments with P-Hold • Proof of concept (no adaptive caching) hit-rate • Evaluation of impact of cache size and simulation running time on speedup (no caching/caching) • Evaluation of adaptive caching with regard to the cost of event computation • 16 processor SGI Origin 2000 • 4 processors • “Curried” out time stamps

Hit Rate versus Progress • As expected hit ratio increases as cache size increases • Maximum hit rate for large cache • Hit rates sets an upper bound for speedup

Speedup vs Cache Size • Speedup improves as size of the cache increases • Beyond size 9,000KB speedup declines and levels off • Better performance for simulations with computations that have higher latency

Speedup vs Costcomputation • Non-adaptive caching suffers a speedup of 0.82 for low latency computations and improves to 1 when the computational latency approaches 1.5 msec

Speedup vs Costcomputation • Adaptive Caching, tracks the cost of consulting the cast in comparison of running the actual computation • Adaptive caching is 1 for small computational latencies (selects performing computation instead of consulting cache)

Summary & Future Work Summary: • Middleware implementation that require no major structural revision of application code • Best case speedup approaches 3.5 worst case speedup of 1 (speedup is limited to a hit rate of 70%) • Random generated information (such as time stamps or other) caching may become ineffective unless taking pre-cautions Future Work: • Function caching instead of LP caching • Look at series of functions to jump forward • Adaptive replacement strategies

Closing “A sword wielded poorly will kill it’s owner” -- Ancient Proverb

// ORIGINAL LP CODE LP_init() { // // // // } Proc(state, msg, MyPE) { val1 = fancy_function(msg->param1, state->key_part); val2 = fancier_function(msg->param3); state->key_part = val1 + val2; } Pseudo-Code Example

// ORIGINAL LP CODE LP_init() { // // // // } Proc(state, msg, MyPE) { val1 = fancy_function(msg->param1, state->key_part); val2 = fancier_function(msg->param3); state->key_part = val1 + val2; } // LP CODE WITH CACHING LP_init() { cache_init(FF1, SIZE1, 2, fancy_function); cache_init(FF2, SIZE2, 1, fancier_function); } Proc(state, msg, MyPE) { val1 = cache_query(FF1, msg->param1, state->key_part); val2 = cache_query(FF2, msg->param3); State->key_part = val1 + val2; } Pseudo-Code Example

Approach • Cache computations and re-use when they repeat instead of re-compute. LP LP LP LP LP LP LP LP LP LP

Towards Adaptive Caching for Parallel and Distributed Simulation