1 / 22

Self-Learning, Adaptive Computer Systems

Intel Collaborative Research Institute Computational Intelligence. Self-Learning, Adaptive Computer Systems. Yoav Etsion , Technion CS & EE Dan Tsafrir , Technion CS Shie Mannor , Technion EE Assaf Schuster, Technion CS. Intel Collaborative Research Institute

kaveri
Download Presentation

Self-Learning, Adaptive Computer Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intel Collaborative Research Institute Computational Intelligence Self-Learning, Adaptive Computer Systems YoavEtsion, Technion CS & EE Dan Tsafrir, Technion CS ShieMannor, Technion EE AssafSchuster, Technion CS

  2. Intel Collaborative Research Institute Computational Intelligence Adaptive Computer Systems • Complexity of computer systems keeps growing • We are moving towards heterogeneous hardware • Workloads are getting more diverse • Process variability affects performance/power of different parts of the system • Human programmers and administrators • cannot handle complexity • The goal: Adapt to workload and hardware variability

  3. Intel Collaborative Research Institute Computational Intelligence Predicting System Behavior • When a human observes the workload, she can typically identify cause and effect • Workload carries inherent semantics • The problem is extracting them automatically… • Key issues with machine learning: • Huge datasets (performance counters; exec. traces) • Need extremely fast response time (in most cases) • Rigid space constraints for ML algorithms

  4. Intel Collaborative Research Institute Computational Intelligence Memory + Machine LearningCurrent state-of-the-art • Architectures are tuned for structured data • Managed using simple heuristics • Spatial and temporal locality • Frequency and recency (ARC) • Block and stride prefetchers • Real data is not well structured • Programmer must transform data • Unrealistic for program agnostic management (swapping, prefetching)

  5. Intel Collaborative Research Institute Computational Intelligence Memory + Machine LearningMultiple learning opportunities • Identify patterns using machine learning • Bring data to the right place at the right time • Memory hierarchy forms a pyramid • Caches / DRAM, PCM / SSD, HDD • Different levels require different learning strategies • Top: smaller, faster, costlier [prefetching to caches] • Bottom: bigger, slower, pricier [fetching from disk] • Need both hardware and software support

  6. Intel Collaborative Research Institute Computational Intelligence Research track: Predicting Latent Faults in Data Centers Moshe Gabel, Assaf Schuster

  7. Intel Collaborative Research Institute Computational Intelligence Latent Fault Detection • Failures and misconfiguration happen in large datacenters • Cause performance anomalies? • Sound statistical framework to detect latent faults • Practical: Non-intrusive, unsupervised, no domain knowledge • Adaptive: No parameter tuning, robust to system/workload changes

  8. Intel Collaborative Research Institute Computational Intelligence Latent Fault Detection • Applied to real-world production service of 4.5K machines • Over 20% machine/sw failures preceded by latent faults • Slow response time; network errors; disk access times • Predict failures 14 days in advance, 70% precision, 2% FPR • Latent Fault Detection in Large Scale Services, DSN 2012

  9. Intel Collaborative Research Institute Computational Intelligence Research track: Task Differentials: Dynamic, inter-thread predictions using memory access footsteps Adi Fuchs , YoavEtsion, ShieMannor, Uri Weiser

  10. Intel Collaborative Research Institute Computational Intelligence Motivation • We are in the age of parallel computing. • Programming paradigms shift towards task level parallelism • Tasks are supported by libraries such as TBB and OpenMP: • Implicit forms of task level parallelism include GPU kernels and parallel loops • Tasks behavior tends to be highly regular = target for learning and adaptation ... GridLauncher<InitDensitiesAndForcesMTWorker> &id = *new (tbb::task::allocate_root()) GridLauncher<InitDensitiesAndForcesMTWorker>(NUM_TBB_GRIDS); tbb::task::spawn_root_and_wait(id); GridLauncher<ComputeDensitiesMTWorker> &cd = *new (tbb::task::allocate_root()) GridLauncher<ComputeDensitiesMTWorker>(NUM_TBB_GRIDS); tbb::task::spawn_root_and_wait(cd); ... Taken from: PARSEC.fluidanimate TBB implementation

  11. Intel Collaborative Research Institute Computational Intelligence How do things currently work? • Programmer codes a parallel loop • SW maps multiple tasks to one thread • HW sees a sequence of instructions • HW prefetchers try to identify patterns between consecutive memory accesses • No notion of program semantics, i.e. execution consists of a sequence of tasks, not instructions A B C D E E A B C

  12. Intel Collaborative Research Institute Computational Intelligence Task Address Set • Given the memory trace of task instance A, the task address set TA is a unique set of addresses ordered by access time: TA: 0x7f27bd6df8 0x61e630 0x6949cc 0x7f77b02010 0x61e6d0 0x61e6e0 Trace: START TASK INSTANCE(A) R 0x7f27bd6df8 R 0x61e630 R 0x6949cc R 0x7f77b02010 R 0x6949cc R 0x61e6d0 R 0x61e6e0 W 0x7f77b02010 STOP TASK INSTANCE(A)

  13. Intel Collaborative Research Institute Computational Intelligence Address Differentials • Motivation: Task instance address sets are usually meaningless TC: 7F27BD6DF8 1560DF0 6AF04C 7F78BBC464 61EA60 61D8C0 TB: 7F27BD6DF8 DBFA10 6A1D0C 7F7835F23A 61E898 61DFD0 TA: 7F27BD6DF8 61E630 6949CC 7F77B02010 61E6D0 61E6E0 + 0 = + 8000480 = + 54080 = + 8770090 = + 456 = -1808 = + 0 = + 8000480 = + 54080 = + 8770090 = + 456 = -1808 = • Differences tend to be compact and regular, thus can represent state transitions

  14. Intel Collaborative Research Institute Computational Intelligence Address Differentials • Given instances A and B, the differential vector is defined as follows: • Example: 32, 96, 8, 64, 96 TA: 10000 60000 8000000 7F00000 FE000 TB: 10020 60060 8000008 7F00040 FE060

  15. Intel Collaborative Research Institute Computational Intelligence Differentials Behavior: Mathematical intuition Non uniform • Differential use is beneficial in cases of high redundancy. • Application distribution functions can provide the intuition on vector repetitions. • Non uniform CDFs imply highly regular patterns. • Uniform CDFs imply noisy patterns (differentials behavior cannot be exploited) Uniform

  16. Intel Collaborative Research Institute Computational Intelligence Differentials Behavior: Mathematical intuition • Given N vectors, straightforward dictionary will be of size: R=log2(N) • Entropy H is a theoretical lower bound on representation, based on distribution: • Example – assuming 1000 vector instances with 4 possible values: R = 2. • Differential Entropy Compression Ratio (DECR) is used as repetition criteria:

  17. Intel Collaborative Research Institute Computational Intelligence Possible differential application: cache line prefetching • First attempt: Prefix based predictor, given a differential prefix – predict suffix • Example: A and B finished running ( is stored) • Now C is running… TC: 7F27BD6DF8 1560DF0 6AF04C? 7F78BBC464? 61EA60? 61D8C0? TB: 7F27BD6DF8 DBFA10 6A1D0C 7F7835F23A 61E898 61DFD0 0, 8000480, 54080? 8770090? 456? -1808? TA: 7F27BD6DF8 61E630 6949CC 7F77B02010 61E6D0 61E6E0 0, 8000480, 54080, 8770090, 456, -1808

  18. Intel Collaborative Research Institute Computational Intelligence Possible differential application: cache line prefetching • Second attempt: PHT predictor, based on the last X differentials – predict next differential. • Example: • 32 96 8 64 96 • 32 96 8 64 96 • 10 16 0 16 32 • 32 96 8 64 96 • 32 96 8 64 96 • 10 16 0 16 32 • 32 96 8 64 96 • 32 96 8 64 96

  19. Intel Collaborative Research Institute Computational Intelligence Possible differential application: cache line prefetching • Prefix policy: Differential DB is a prefix tree, Prediction performed once differential prefix is unique. • PHT policy: Differential DB hold the history table, Prediction performed upon task start, based on history pattern:

  20. Intel Collaborative Research Institute Computational Intelligence Possible differential application: cache line prefetching • Predictors compared with 2 models: Base (no prefetching) and Ideal (theoretical predictor – accurately predicts every repeating differential)

  21. Intel Collaborative Research Institute Computational Intelligence Future work • Hybrid policies: which policy to use when? (PHT is better for complete vector repetitions, prefix is better for partial vector repetitions, i.e. suffixes) • Regular expression based policy (for pattern matching, beyond “ideal” model) • Predict other functional features using differentials (e.g. branch prediction, PTE prefetching etc.)

  22. Intel Collaborative Research Institute Computational Intelligence Conclusions (so far…) • When we look at the data, patterns emerge… • Quite a large headroom for optimizing computer systems • Existing predictions are based on heuristics • A machine that does not respond within 1s is considered dead • Memory prefetchers look for blocked and strided accesses • Goal: Use ML, not heuristics, to uncover behavioral semantics

More Related