1 / 23

Smart Data Structures

Smart Data Structures. Jonathan Eastep David Wingate Anant Agarwal. 06/3/2012. Multicores are Complex!. The Problem System complexity is skyrocketing! Multicore architecture is a moving target The best algorithm and algorithm settings depend

topaz
Download Presentation

Smart Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Smart Data Structures Jonathan Eastep David Wingate Anant Agarwal 06/3/2012

  2. Multicores are Complex! • The Problem • System complexity is skyrocketing! • Multicore architecture is a moving target • The best algorithm and algorithm settings depend • Application inputs and workloads can be dynamic • Online tuning is necessary but typically absent

  3. The Big Picture • Developed a dynamic optimization framework to auto-tune software and minimize burden • Framework is based on online machine learning technologies • Demonstrated the framework by designing “Smart Data Structures” for parallel programs • The framework is general; could apply to systems such as Clouds, OS, Runtimes

  4. Smart Data Structures • Smart Data Structures are parallel data structures that self-optimize to minimize programmer burden • They use online machine learning to adapt to changing app or system needs and achieve the best performance • A library of Smart Data Structures open sourced on github (GPL) • github.com/mit-carbon/Smart-Data-Structures • Publications: [1], [2], [3], [4]

  5. A Sketch of The Benefits of SDS • Use a Smart Lock to optimize a master-worker program • Measure rate of completed work items • Emulate dynamic frequency scaling due to Intel Turbo Boost® • Workload 1: Worker 0 @ 3GHz, others @ 2GHz • Workload 2: Worker 3 @ 3GHz, others @ 2GHz Ideal Smartlock gap Baseline (Items per second) / 1e6

  6. Outline • Smart Data Structures • Anatomy of a Smart Data Structure • Implementation Example • Research Challenges and Solutions • Online Machine Learning Algorithm • Empirical Benchmark Results • Empirical Scalability Studies • Future Directions • Conclusions

  7. What are Smart Data Structures? • Self-aware computing applied to data structures • Data Structures that self-optimize using online learning • We can optimize knobs in other systems too Storage Storage Algorithm Algorithm Interface Interface • add • add Data Smart Data Online E.g. E.g. • remove • remove knobs knobs • per system • automatically Learning Structure Structure • peek tn t1 t2 t2 tn t1 • peek Smart Queue Queue … … • per app • at runtime • hand-tuned • self-tuned • static

  8. Smart Data Structure Library • C++/C Library of Popular Parallel Data Structures • ML Optimization Type: • Lock Acquisition Scheduling • Tuning Flat Combining • Dynamic Load-Balancing • Supported: • Smart Lock • Smart Queue • Smart SkipList • Smart PairHeap • Smart Stack • Future Work: • Smart DHT

  9. Smart Queue, SkipList, PairHeap, Stack • Implementation should leverage best-performing prior work • What are the best? Determine with experiments. • Result: Flat Combining Data Structures from Hendler et al. • This is contrary to conventional wisdom • Reason: FC Algorithm minimizes synchronization overheads by combining data structure ops and applying multiple ops at once

  10. Flat Combining Primer Serial Data Structure Lock Scancount 1 0!!! 3 2 Working Combining Working Working Working enq a enq b enq d enq d enq c enq b enq c enq b enq a enq c

  11. Smart Queue, SkipList, PairHeap, Stack • Here the application of learning is to auto-tune a performance-critical knob called the scancount E.g.: Thread Request Serial Queue Records Smart • enqueue • dequeue Queue • peek Scancount Lock Reinforcement knobs tn t1 t2 Learning (of a discrete variable) Interface … • number of scans over request records • dynamically tune the time spent combining

  12. Why Does the Scancount Matter? • Scancount controls how long threads spend as the combiner • Increasing scancount allows combiner to do more data structure ops within the same lock • But, increasing scancount increases latency of the combiner’s op • It’s good to increase scancount up to a point, but after that latency can hurt performance • Smart Data Structures use online learning to find the ideal scancount at any given time

  13. SDS Implementation • Goal: minimize application disruption • Internal lightweight statistics or external application-specific reward signal • Number of learning threads is one by default; it runs learning engines for all SDS throughput (ops/s) Storage Reward Algorithm Interface • add • remove • peek s Smart Data Online t E.g. a Learning Structure Smart Queue t tn t1 t2 … External Application Learning Perf. Monitor Threads Thread E.g. Heartbeats

  14. SDS Implementation • Machine learning co-optimization framework • Supports joint optimization: multiple knobs • Supports discrete, gaussian, boolean, permutation knobs • Designed explicitly to support other systems than SDS

  15. Major SDS Research Challenges • How do you find knob settings with best long-term effects? • How do you measure if a knob setting is helping? • How do you optimize quickly enough to not miss opportunities? • How do you manage a potentially intractable search space? Quality Challenges Timeliness Challenges

  16. Addressing Other Quality Challenges • How do you find settings with best long-term effects? • Leverage one of the machine learning technologies for planning • Use online RL to adapt to workload or phase changes • How do you measure if a knob setting is helping? • Extensible reward signal interface for performance monitoring • Heartbeats Framework for application-specific perf. evaluations

  17. Addressing Timeliness Challenges • How to optimize fast enough not to miss opportunities? • Choose a fast gradient-based machine learning algorithm • Use learning helper thread to decouple learning from app threads • How to manage potentially intractable search space? • Relax potentially exponential discrete action space into continuous one • Use a stochastic soft-max policy which enables gradient-based learning “Sorry I’m late dear… have you been waiting long?” Burberry

  18. Reinforcement Learning Algorithm • Goal: optimize rate of reward (e.g. heart rate) • Method: Policy Gradients Algorithm • Online, model-free, handles exponential knob spaces • Learn a stochastic policy which will give a probability distribution over knob settings for each knob • Sample settings for each knob from the policy, try them empirically, and listen to performance feedback signal • Improve the policy using a method analogous to gradient ascent • I.e. estimate gradient of the reward wrt policy and step policy in the gradient direction to get maximum reward • Balance exploration vs. exploitation + make policy differentiable via stochastic soft-max policy

  19. How Does SDS Perform? • Full sweep over SDS, load: compare against Static Oracle • Result: near-ideal performance in many cases • Result:Quality Challengeis met †14 threads Static Oracle SDS Dynamic Static Avg

  20. What if Workload Changes Rapidly? • Inject changes in the data structure “load” (i.e. post computation between ops) • Sweep over SDS, random load schedules, frequencies • Result: Good benefit even when load changes every 10μs • Result: Quality and Timeliness Challenges are met †14 threads Dynamic Average SDS Dynamic Dynamic Oracle

  21. Future Directions • Extend this work to a common framework to coordinate tuning across all system layers • E.g.: application -> runtime -> OS -> HW • Scalable, decentralized optimization methods

  22. Conclusions • Developed a framework to dynamically tune systems and minimize programmer burden via online machine learning • Demonstrated the framework through a case study of self-tuning “Smart Data Structures” • Now looking at uses in systems beyond data structures • jonathan dot eastep at gmail • Reinforcement Learning will play an increasingly important role in the development of future software and hardware

  23. Presentation References [1] J. Eastep, D. Wingate, M. D. Santambrogio, A. Agarwal, “Smartlocks: Lock Acquisition Scheduling for Self-Aware Synchronization,” 7th IEEE International Conference on Autonomic Computing (ICAC’10), 2010. Best Student Paper Award (pdf) [2] J. Eastep, D. Wingate, M.D. Santambrogio, A. Agarwal, “Smartlocks: Self-Aware Synchronization through Lock Acquisition Scheduling,” MIT CSAIL Technical Report, MIT-CSAIL-TR-2009-055, November 2009. (pdf) [3] J. Eastep, D. Wingate, A. Agarwal, “Smart Data Structures: A Reinforcement Learning Approach to Multicore Data Structures,” 8th IEEE International Conference on Autonomic Computing (ICAC’11), 2011. (pdf) [4] J. Eastep, “Smart Data Structures: An Online Machine Learning Approach to Multicore Data Structures,” Doctoral Dissertation, MIT, May 2011 (pdf)

More Related