1 / 33

Towards Adaptive Caching for Parallel and Distributed Simulation

Towards Adaptive Caching for Parallel and Distributed Simulation. Abhishek Chugh & Maria Hybinette Computer Science Department The University of Georgia WSC-2004. Simulation Model Assumptions. Collection of Logical Processes (LPs) Assume LPs do not share state variables

Download Presentation

Towards Adaptive Caching for Parallel and Distributed Simulation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Adaptive Caching forParallel and Distributed Simulation Abhishek Chugh & Maria Hybinette Computer Science Department The University of Georgia WSC-2004

  2. Simulation Model Assumptions • Collection of Logical Processes (LPs) • Assume LPs do not share state variables • Communicate by exchanging time stamped messages Airspace LP LP LP LP LP Atlanta Munich

  3. Problem & Goal • Problem: Inefficiency in PDES: Redundant computations • Observation: Computations repeat: • Long run of simulations • Cyclic Systems • Communication network simulations • Goal: Increase efficiency by reusing computations

  4. Cache Approach • Cache computations and re-use when they repeat instead of re-compute. Msg Msg Msg Msg Msg Msg Msg Msg Msg LP LP LP LP Msg LP LP Msg Msg Msg LP LP Msg

  5. Cache Msg LP LP Msg Msg LP LP Msg Approach: Adaptive Caching • Cache computations and re-use when they repeat instead of re-compute. • Generic caching mechanism independent of simulation engine and application • Caveat: Different factors that impact the effectiveness of caching • Proposal: An adaptive approach

  6. Factors Affecting Caching Effectiveness • Cache size • Cost of looking up into the cache and updating cache • Execution time of the computation • Probability of a hit: Hit rate

  7. Effective Caching Cost E(Costuse_cache)= hit_rate * Costlookup_hit + (1 - hit_rate) * (Costlookup_miss + Costcomputation+ Costinsert)

  8. Caching is Not Always a Good Idea E(Costuse_cache)= hit_rate * Costlookup_hit + (1 - hit_rate) * (Costlookup_miss + Costcomputation+ Costinsert) • Hit rate low, or • Very fast computation • Only when Costuse_cache < Costcomputation is caching worthwhile

  9. How Much Speedup is Possible? Neglecting cache warm up and fixed costs Expected Speedup = Costcomputation / Costuse_cache Upper bound (hit_rate = 1) = Costcomputation / Costlookup In our experiments Costcomputation / Costlookup = ~3.5

  10. Related Work • Function Caching: Replace application level function calls with cache queries: • Introduced by: Bellman (1957); Michie (1968) • Incremental computations: • Pugh & Teitelbaum (1989), Liu & Teitelbaum (1995) • Sequential discrete event simulation: • Staged Simulation: Walsh & Sirer (2003) function caching + currying (break up computations), re-ordering and pre-computations), • Decision Tool Techniques for PADS: Multiple runs of similar simulations • Simulation Cloning: Hybinette & Fujimoto (1998); Chen & Turner, et al (2002); Straburger (2000) • Updateable Simulations (Ferenci et al 2002) • Related Optimization Techniques • Lazy Re-Evaluation: West (1988)

  11. Overview of Adaptive Caching Execution time: • Warm-up execution phase, for each function: • Monitor: hit rate, query time, function run time • Determine utility of using cache • Main execution phase, for each function: • Use cache (or not) depending on results from 1 • Randomly sample: hit rate, query time, function run time • Revise decision if conditions change

  12. What’s New • Decision to use cache is made dynamically • in response to unpredictable local conditions for each LP at execution time • Relieves user of having to know whether something is worth caching • adaptive method will automatically identify caching opportunities, reject poor caching choices • Easy to use caching API • independent of application or simulation kernel • cache middleware • Distributed cache • Each LP maintains own independent cache

  13. // ORIGINAL LP CODE LP_init() { cacheInitialize(int argc, char** argv); } Pseudo-Code Example

  14. // ORIGINAL LP CODE LP_init() { cacheInitialize(int argc, char** argv); } Pseudo-Code Example

  15. // ORIGINAL LP CODE LP_init() { cacheInitialize(int argc, char** argv); } Proc(state, msg, MyPE) { retval = cacheCheckStart( currentstate, event ); if( retval == NULL ) { /* original LP code. compute new state and events to be scheduled */ /* allow cache to save results */ cacheCheckEnd( newstate, newevents ) ; } else { newstate = retval.state; newevents = retval.events; } schedule( newevents ); } Pseudo-Code Example

  16. // ORIGINAL LP CODE LP_init() { cacheInitialize(int argc, char** argv); } Proc(state, msg, MyPE) { retval = cacheCheckStart( currentstate, event ); if( retval == NULL ) { /* original LP code. compute new state and events to be scheduled */ /* allow cache to save results */ cacheCheckEnd( newstate, newevents ) ; } else { newstate = retval.state; newevents = retval.events; } schedule( newevents ); } Pseudo-Code Example

  17. Implementation

  18. Caching Middleware Simulation Application Cache Middleware Simulation Kernel

  19. Caching Middleware (Hit) Simulation Application Cache Middleware Check cache state/message Cache Hit Simulation Kernel

  20. Caching Middleware (Miss) Simulation Application Miss or cache lookup expensive Miss: Cache new state & message Cache Middleware Check cache state/message Cache Miss Simulation Kernel

  21. Cache Implementation • Hash table and separate chaining • Input: Current State & Message • Output: State and output message(s) • Hash function (djb2 by Dan Bernstein, Perl)

  22. Memory Management • Distributed cache; one for each LP • Pre-allocate memory pool for cache in each LP during initialization phase • Upper limit parameterized

  23. Experiments • 3 Sets of Experiments with P-Hold • Proof of concept (no adaptive caching) hit-rate • Evaluation of impact of cache size and simulation running time on speedup (no caching/caching) • Evaluation of adaptive caching with regard to the cost of event computation • 16 processor SGI Origin 2000 • 4 processors • “Curried” out time stamps

  24. Hit Rate versus Progress • As expected hit ratio increases as cache size increases • Maximum hit rate for large cache • Hit rates sets an upper bound for speedup

  25. Speedup vs Cache Size • Speedup improves as size of the cache increases • Beyond size 9,000KB speedup declines and levels off • Better performance for simulations with computations that have higher latency

  26. Speedup vs Costcomputation • Non-adaptive caching suffers a speedup of 0.82 for low latency computations and improves to 1 when the computational latency approaches 1.5 msec

  27. Speedup vs Costcomputation • Adaptive Caching, tracks the cost of consulting the cast in comparison of running the actual computation • Adaptive caching is 1 for small computational latencies (selects performing computation instead of consulting cache)

  28. Summary & Future Work Summary: • Middleware implementation that require no major structural revision of application code • Best case speedup approaches 3.5 worst case speedup of 1 (speedup is limited to a hit rate of 70%) • Random generated information (such as time stamps or other) caching may become ineffective unless taking pre-cautions Future Work: • Function caching instead of LP caching • Look at series of functions to jump forward • Adaptive replacement strategies

  29. Closing “A sword wielded poorly will kill it’s owner” -- Ancient Proverb

  30. // ORIGINAL LP CODE LP_init() { // // // // } Proc(state, msg, MyPE) { val1 = fancy_function(msg->param1, state->key_part); val2 = fancier_function(msg->param3); state->key_part = val1 + val2; } Pseudo-Code Example

  31. // ORIGINAL LP CODE LP_init() { // // // // } Proc(state, msg, MyPE) { val1 = fancy_function(msg->param1, state->key_part); val2 = fancier_function(msg->param3); state->key_part = val1 + val2; } Pseudo-Code Example

  32. // ORIGINAL LP CODE LP_init() { // // // // } Proc(state, msg, MyPE) { val1 = fancy_function(msg->param1, state->key_part); val2 = fancier_function(msg->param3); state->key_part = val1 + val2; } // LP CODE WITH CACHING LP_init() { cache_init(FF1, SIZE1, 2, fancy_function); cache_init(FF2, SIZE2, 1, fancier_function); } Proc(state, msg, MyPE) { val1 = cache_query(FF1, msg->param1, state->key_part); val2 = cache_query(FF2, msg->param3); State->key_part = val1 + val2; } Pseudo-Code Example

  33. Approach • Cache computations and re-use when they repeat instead of re-compute. LP LP LP LP LP LP LP LP LP LP

More Related