260 likes | 467 Views
Smart Data Structures. Jonathan Eastep David Wingate Anant Agarwal. 06/3/2012. Multicores are Complex!. The Problem System complexity is skyrocketing! Multicore architecture is a moving target The best algorithm and algorithm settings depend
E N D
Smart Data Structures Jonathan Eastep David Wingate Anant Agarwal 06/3/2012
Multicores are Complex! • The Problem • System complexity is skyrocketing! • Multicore architecture is a moving target • The best algorithm and algorithm settings depend • Application inputs and workloads can be dynamic • Online tuning is necessary but typically absent
The Big Picture • Developed a dynamic optimization framework to auto-tune software and minimize burden • Framework is based on online machine learning technologies • Demonstrated the framework by designing “Smart Data Structures” for parallel programs • The framework is general; could apply to systems such as Clouds, OS, Runtimes
Smart Data Structures • Smart Data Structures are parallel data structures that self-optimize to minimize programmer burden • They use online machine learning to adapt to changing app or system needs and achieve the best performance • A library of Smart Data Structures open sourced on github (GPL) • github.com/mit-carbon/Smart-Data-Structures • Publications: [1], [2], [3], [4]
A Sketch of The Benefits of SDS • Use a Smart Lock to optimize a master-worker program • Measure rate of completed work items • Emulate dynamic frequency scaling due to Intel Turbo Boost® • Workload 1: Worker 0 @ 3GHz, others @ 2GHz • Workload 2: Worker 3 @ 3GHz, others @ 2GHz Ideal Smartlock gap Baseline (Items per second) / 1e6
Outline • Smart Data Structures • Anatomy of a Smart Data Structure • Implementation Example • Research Challenges and Solutions • Online Machine Learning Algorithm • Empirical Benchmark Results • Empirical Scalability Studies • Future Directions • Conclusions
What are Smart Data Structures? • Self-aware computing applied to data structures • Data Structures that self-optimize using online learning • We can optimize knobs in other systems too Storage Storage Algorithm Algorithm Interface Interface • add • add Data Smart Data Online E.g. E.g. • remove • remove knobs knobs • per system • automatically Learning Structure Structure • peek tn t1 t2 t2 tn t1 • peek Smart Queue Queue … … • per app • at runtime • hand-tuned • self-tuned • static
Smart Data Structure Library • C++/C Library of Popular Parallel Data Structures • ML Optimization Type: • Lock Acquisition Scheduling • Tuning Flat Combining • Dynamic Load-Balancing • Supported: • Smart Lock • Smart Queue • Smart SkipList • Smart PairHeap • Smart Stack • Future Work: • Smart DHT
Smart Queue, SkipList, PairHeap, Stack • Implementation should leverage best-performing prior work • What are the best? Determine with experiments. • Result: Flat Combining Data Structures from Hendler et al. • This is contrary to conventional wisdom • Reason: FC Algorithm minimizes synchronization overheads by combining data structure ops and applying multiple ops at once
Flat Combining Primer Serial Data Structure Lock Scancount 1 0!!! 3 2 Working Combining Working Working Working enq a enq b enq d enq d enq c enq b enq c enq b enq a enq c
Smart Queue, SkipList, PairHeap, Stack • Here the application of learning is to auto-tune a performance-critical knob called the scancount E.g.: Thread Request Serial Queue Records Smart • enqueue • dequeue Queue • peek Scancount Lock Reinforcement knobs tn t1 t2 Learning (of a discrete variable) Interface … • number of scans over request records • dynamically tune the time spent combining
Why Does the Scancount Matter? • Scancount controls how long threads spend as the combiner • Increasing scancount allows combiner to do more data structure ops within the same lock • But, increasing scancount increases latency of the combiner’s op • It’s good to increase scancount up to a point, but after that latency can hurt performance • Smart Data Structures use online learning to find the ideal scancount at any given time
SDS Implementation • Goal: minimize application disruption • Internal lightweight statistics or external application-specific reward signal • Number of learning threads is one by default; it runs learning engines for all SDS throughput (ops/s) Storage Reward Algorithm Interface • add • remove • peek s Smart Data Online t E.g. a Learning Structure Smart Queue t tn t1 t2 … External Application Learning Perf. Monitor Threads Thread E.g. Heartbeats
SDS Implementation • Machine learning co-optimization framework • Supports joint optimization: multiple knobs • Supports discrete, gaussian, boolean, permutation knobs • Designed explicitly to support other systems than SDS
Major SDS Research Challenges • How do you find knob settings with best long-term effects? • How do you measure if a knob setting is helping? • How do you optimize quickly enough to not miss opportunities? • How do you manage a potentially intractable search space? Quality Challenges Timeliness Challenges
Addressing Other Quality Challenges • How do you find settings with best long-term effects? • Leverage one of the machine learning technologies for planning • Use online RL to adapt to workload or phase changes • How do you measure if a knob setting is helping? • Extensible reward signal interface for performance monitoring • Heartbeats Framework for application-specific perf. evaluations
Addressing Timeliness Challenges • How to optimize fast enough not to miss opportunities? • Choose a fast gradient-based machine learning algorithm • Use learning helper thread to decouple learning from app threads • How to manage potentially intractable search space? • Relax potentially exponential discrete action space into continuous one • Use a stochastic soft-max policy which enables gradient-based learning “Sorry I’m late dear… have you been waiting long?” Burberry
Reinforcement Learning Algorithm • Goal: optimize rate of reward (e.g. heart rate) • Method: Policy Gradients Algorithm • Online, model-free, handles exponential knob spaces • Learn a stochastic policy which will give a probability distribution over knob settings for each knob • Sample settings for each knob from the policy, try them empirically, and listen to performance feedback signal • Improve the policy using a method analogous to gradient ascent • I.e. estimate gradient of the reward wrt policy and step policy in the gradient direction to get maximum reward • Balance exploration vs. exploitation + make policy differentiable via stochastic soft-max policy
How Does SDS Perform? • Full sweep over SDS, load: compare against Static Oracle • Result: near-ideal performance in many cases • Result:Quality Challengeis met †14 threads Static Oracle SDS Dynamic Static Avg
What if Workload Changes Rapidly? • Inject changes in the data structure “load” (i.e. post computation between ops) • Sweep over SDS, random load schedules, frequencies • Result: Good benefit even when load changes every 10μs • Result: Quality and Timeliness Challenges are met †14 threads Dynamic Average SDS Dynamic Dynamic Oracle
Future Directions • Extend this work to a common framework to coordinate tuning across all system layers • E.g.: application -> runtime -> OS -> HW • Scalable, decentralized optimization methods
Conclusions • Developed a framework to dynamically tune systems and minimize programmer burden via online machine learning • Demonstrated the framework through a case study of self-tuning “Smart Data Structures” • Now looking at uses in systems beyond data structures • jonathan dot eastep at gmail • Reinforcement Learning will play an increasingly important role in the development of future software and hardware
Presentation References [1] J. Eastep, D. Wingate, M. D. Santambrogio, A. Agarwal, “Smartlocks: Lock Acquisition Scheduling for Self-Aware Synchronization,” 7th IEEE International Conference on Autonomic Computing (ICAC’10), 2010. Best Student Paper Award (pdf) [2] J. Eastep, D. Wingate, M.D. Santambrogio, A. Agarwal, “Smartlocks: Self-Aware Synchronization through Lock Acquisition Scheduling,” MIT CSAIL Technical Report, MIT-CSAIL-TR-2009-055, November 2009. (pdf) [3] J. Eastep, D. Wingate, A. Agarwal, “Smart Data Structures: A Reinforcement Learning Approach to Multicore Data Structures,” 8th IEEE International Conference on Autonomic Computing (ICAC’11), 2011. (pdf) [4] J. Eastep, “Smart Data Structures: An Online Machine Learning Approach to Multicore Data Structures,” Doctoral Dissertation, MIT, May 2011 (pdf)