Smart Data Structures

Smart Data Structures Jonathan Eastep David Wingate Anant Agarwal 06/3/2012

Multicores are Complex! • The Problem • System complexity is skyrocketing! • Multicore architecture is a moving target • The best algorithm and algorithm settings depend • Application inputs and workloads can be dynamic • Online tuning is necessary but typically absent

The Big Picture • Developed a dynamic optimization framework to auto-tune software and minimize burden • Framework is based on online machine learning technologies • Demonstrated the framework by designing “Smart Data Structures” for parallel programs • The framework is general; could apply to systems such as Clouds, OS, Runtimes

Smart Data Structures • Smart Data Structures are parallel data structures that self-optimize to minimize programmer burden • They use online machine learning to adapt to changing app or system needs and achieve the best performance • A library of Smart Data Structures open sourced on github (GPL) • github.com/mit-carbon/Smart-Data-Structures • Publications: [1], [2], [3], [4]

A Sketch of The Benefits of SDS • Use a Smart Lock to optimize a master-worker program • Measure rate of completed work items • Emulate dynamic frequency scaling due to Intel Turbo Boost® • Workload 1: Worker 0 @ 3GHz, others @ 2GHz • Workload 2: Worker 3 @ 3GHz, others @ 2GHz Ideal Smartlock gap Baseline (Items per second) / 1e6

Outline • Smart Data Structures • Anatomy of a Smart Data Structure • Implementation Example • Research Challenges and Solutions • Online Machine Learning Algorithm • Empirical Benchmark Results • Empirical Scalability Studies • Future Directions • Conclusions

What are Smart Data Structures? • Self-aware computing applied to data structures • Data Structures that self-optimize using online learning • We can optimize knobs in other systems too Storage Storage Algorithm Algorithm Interface Interface • add • add Data Smart Data Online E.g. E.g. • remove • remove knobs knobs • per system • automatically Learning Structure Structure • peek tn t1 t2 t2 tn t1 • peek Smart Queue Queue … … • per app • at runtime • hand-tuned • self-tuned • static

Smart Data Structure Library • C++/C Library of Popular Parallel Data Structures • ML Optimization Type: • Lock Acquisition Scheduling • Tuning Flat Combining • Dynamic Load-Balancing • Supported: • Smart Lock • Smart Queue • Smart SkipList • Smart PairHeap • Smart Stack • Future Work: • Smart DHT

Smart Queue, SkipList, PairHeap, Stack • Implementation should leverage best-performing prior work • What are the best? Determine with experiments. • Result: Flat Combining Data Structures from Hendler et al. • This is contrary to conventional wisdom • Reason: FC Algorithm minimizes synchronization overheads by combining data structure ops and applying multiple ops at once

Flat Combining Primer Serial Data Structure Lock Scancount 1 0!!! 3 2 Working Combining Working Working Working enq a enq b enq d enq d enq c enq b enq c enq b enq a enq c

Smart Queue, SkipList, PairHeap, Stack • Here the application of learning is to auto-tune a performance-critical knob called the scancount E.g.: Thread Request Serial Queue Records Smart • enqueue • dequeue Queue • peek Scancount Lock Reinforcement knobs tn t1 t2 Learning (of a discrete variable) Interface … • number of scans over request records • dynamically tune the time spent combining

Why Does the Scancount Matter? • Scancount controls how long threads spend as the combiner • Increasing scancount allows combiner to do more data structure ops within the same lock • But, increasing scancount increases latency of the combiner’s op • It’s good to increase scancount up to a point, but after that latency can hurt performance • Smart Data Structures use online learning to find the ideal scancount at any given time

SDS Implementation • Goal: minimize application disruption • Internal lightweight statistics or external application-specific reward signal • Number of learning threads is one by default; it runs learning engines for all SDS throughput (ops/s) Storage Reward Algorithm Interface • add • remove • peek s Smart Data Online t E.g. a Learning Structure Smart Queue t tn t1 t2 … External Application Learning Perf. Monitor Threads Thread E.g. Heartbeats

SDS Implementation • Machine learning co-optimization framework • Supports joint optimization: multiple knobs • Supports discrete, gaussian, boolean, permutation knobs • Designed explicitly to support other systems than SDS

Major SDS Research Challenges • How do you find knob settings with best long-term effects? • How do you measure if a knob setting is helping? • How do you optimize quickly enough to not miss opportunities? • How do you manage a potentially intractable search space? Quality Challenges Timeliness Challenges

Addressing Other Quality Challenges • How do you find settings with best long-term effects? • Leverage one of the machine learning technologies for planning • Use online RL to adapt to workload or phase changes • How do you measure if a knob setting is helping? • Extensible reward signal interface for performance monitoring • Heartbeats Framework for application-specific perf. evaluations

Addressing Timeliness Challenges • How to optimize fast enough not to miss opportunities? • Choose a fast gradient-based machine learning algorithm • Use learning helper thread to decouple learning from app threads • How to manage potentially intractable search space? • Relax potentially exponential discrete action space into continuous one • Use a stochastic soft-max policy which enables gradient-based learning “Sorry I’m late dear… have you been waiting long?” Burberry

Reinforcement Learning Algorithm • Goal: optimize rate of reward (e.g. heart rate) • Method: Policy Gradients Algorithm • Online, model-free, handles exponential knob spaces • Learn a stochastic policy which will give a probability distribution over knob settings for each knob • Sample settings for each knob from the policy, try them empirically, and listen to performance feedback signal • Improve the policy using a method analogous to gradient ascent • I.e. estimate gradient of the reward wrt policy and step policy in the gradient direction to get maximum reward • Balance exploration vs. exploitation + make policy differentiable via stochastic soft-max policy

How Does SDS Perform? • Full sweep over SDS, load: compare against Static Oracle • Result: near-ideal performance in many cases • Result:Quality Challengeis met †14 threads Static Oracle SDS Dynamic Static Avg

What if Workload Changes Rapidly? • Inject changes in the data structure “load” (i.e. post computation between ops) • Sweep over SDS, random load schedules, frequencies • Result: Good benefit even when load changes every 10μs • Result: Quality and Timeliness Challenges are met †14 threads Dynamic Average SDS Dynamic Dynamic Oracle

Future Directions • Extend this work to a common framework to coordinate tuning across all system layers • E.g.: application -> runtime -> OS -> HW • Scalable, decentralized optimization methods

Conclusions • Developed a framework to dynamically tune systems and minimize programmer burden via online machine learning • Demonstrated the framework through a case study of self-tuning “Smart Data Structures” • Now looking at uses in systems beyond data structures • jonathan dot eastep at gmail • Reinforcement Learning will play an increasingly important role in the development of future software and hardware

Presentation References [1] J. Eastep, D. Wingate, M. D. Santambrogio, A. Agarwal, “Smartlocks: Lock Acquisition Scheduling for Self-Aware Synchronization,” 7th IEEE International Conference on Autonomic Computing (ICAC’10), 2010. Best Student Paper Award (pdf) [2] J. Eastep, D. Wingate, M.D. Santambrogio, A. Agarwal, “Smartlocks: Self-Aware Synchronization through Lock Acquisition Scheduling,” MIT CSAIL Technical Report, MIT-CSAIL-TR-2009-055, November 2009. (pdf) [3] J. Eastep, D. Wingate, A. Agarwal, “Smart Data Structures: A Reinforcement Learning Approach to Multicore Data Structures,” 8th IEEE International Conference on Autonomic Computing (ICAC’11), 2011. (pdf) [4] J. Eastep, “Smart Data Structures: An Online Machine Learning Approach to Multicore Data Structures,” Doctoral Dissertation, MIT, May 2011 (pdf)

Smart Data Structures

Smart Data Structures

Presentation Transcript

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Smart Structures Overview

DATA STRUCTURES

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Smart Structures