230 likes | 333 Views
Intel Collaborative Research Institute Computational Intelligence. Self-Learning, Adaptive Computer Systems. Yoav Etsion , Technion CS & EE Dan Tsafrir , Technion CS Shie Mannor , Technion EE Assaf Schuster, Technion CS. Intel Collaborative Research Institute
E N D
Intel Collaborative Research Institute Computational Intelligence Self-Learning, Adaptive Computer Systems YoavEtsion, Technion CS & EE Dan Tsafrir, Technion CS ShieMannor, Technion EE AssafSchuster, Technion CS
Intel Collaborative Research Institute Computational Intelligence Adaptive Computer Systems • Complexity of computer systems keeps growing • We are moving towards heterogeneous hardware • Workloads are getting more diverse • Process variability affects performance/power of different parts of the system • Human programmers and administrators • cannot handle complexity • The goal: Adapt to workload and hardware variability
Intel Collaborative Research Institute Computational Intelligence Predicting System Behavior • When a human observes the workload, she can typically identify cause and effect • Workload carries inherent semantics • The problem is extracting them automatically… • Key issues with machine learning: • Huge datasets (performance counters; exec. traces) • Need extremely fast response time (in most cases) • Rigid space constraints for ML algorithms
Intel Collaborative Research Institute Computational Intelligence Memory + Machine LearningCurrent state-of-the-art • Architectures are tuned for structured data • Managed using simple heuristics • Spatial and temporal locality • Frequency and recency (ARC) • Block and stride prefetchers • Real data is not well structured • Programmer must transform data • Unrealistic for program agnostic management (swapping, prefetching)
Intel Collaborative Research Institute Computational Intelligence Memory + Machine LearningMultiple learning opportunities • Identify patterns using machine learning • Bring data to the right place at the right time • Memory hierarchy forms a pyramid • Caches / DRAM, PCM / SSD, HDD • Different levels require different learning strategies • Top: smaller, faster, costlier [prefetching to caches] • Bottom: bigger, slower, pricier [fetching from disk] • Need both hardware and software support
Intel Collaborative Research Institute Computational Intelligence Research track: Predicting Latent Faults in Data Centers Moshe Gabel, Assaf Schuster
Intel Collaborative Research Institute Computational Intelligence Latent Fault Detection • Failures and misconfiguration happen in large datacenters • Cause performance anomalies? • Sound statistical framework to detect latent faults • Practical: Non-intrusive, unsupervised, no domain knowledge • Adaptive: No parameter tuning, robust to system/workload changes
Intel Collaborative Research Institute Computational Intelligence Latent Fault Detection • Applied to real-world production service of 4.5K machines • Over 20% machine/sw failures preceded by latent faults • Slow response time; network errors; disk access times • Predict failures 14 days in advance, 70% precision, 2% FPR • Latent Fault Detection in Large Scale Services, DSN 2012
Intel Collaborative Research Institute Computational Intelligence Research track: Task Differentials: Dynamic, inter-thread predictions using memory access footsteps Adi Fuchs , YoavEtsion, ShieMannor, Uri Weiser
Intel Collaborative Research Institute Computational Intelligence Motivation • We are in the age of parallel computing. • Programming paradigms shift towards task level parallelism • Tasks are supported by libraries such as TBB and OpenMP: • Implicit forms of task level parallelism include GPU kernels and parallel loops • Tasks behavior tends to be highly regular = target for learning and adaptation ... GridLauncher<InitDensitiesAndForcesMTWorker> &id = *new (tbb::task::allocate_root()) GridLauncher<InitDensitiesAndForcesMTWorker>(NUM_TBB_GRIDS); tbb::task::spawn_root_and_wait(id); GridLauncher<ComputeDensitiesMTWorker> &cd = *new (tbb::task::allocate_root()) GridLauncher<ComputeDensitiesMTWorker>(NUM_TBB_GRIDS); tbb::task::spawn_root_and_wait(cd); ... Taken from: PARSEC.fluidanimate TBB implementation
Intel Collaborative Research Institute Computational Intelligence How do things currently work? • Programmer codes a parallel loop • SW maps multiple tasks to one thread • HW sees a sequence of instructions • HW prefetchers try to identify patterns between consecutive memory accesses • No notion of program semantics, i.e. execution consists of a sequence of tasks, not instructions A B C D E E A B C
Intel Collaborative Research Institute Computational Intelligence Task Address Set • Given the memory trace of task instance A, the task address set TA is a unique set of addresses ordered by access time: TA: 0x7f27bd6df8 0x61e630 0x6949cc 0x7f77b02010 0x61e6d0 0x61e6e0 Trace: START TASK INSTANCE(A) R 0x7f27bd6df8 R 0x61e630 R 0x6949cc R 0x7f77b02010 R 0x6949cc R 0x61e6d0 R 0x61e6e0 W 0x7f77b02010 STOP TASK INSTANCE(A)
Intel Collaborative Research Institute Computational Intelligence Address Differentials • Motivation: Task instance address sets are usually meaningless TC: 7F27BD6DF8 1560DF0 6AF04C 7F78BBC464 61EA60 61D8C0 TB: 7F27BD6DF8 DBFA10 6A1D0C 7F7835F23A 61E898 61DFD0 TA: 7F27BD6DF8 61E630 6949CC 7F77B02010 61E6D0 61E6E0 + 0 = + 8000480 = + 54080 = + 8770090 = + 456 = -1808 = + 0 = + 8000480 = + 54080 = + 8770090 = + 456 = -1808 = • Differences tend to be compact and regular, thus can represent state transitions
Intel Collaborative Research Institute Computational Intelligence Address Differentials • Given instances A and B, the differential vector is defined as follows: • Example: 32, 96, 8, 64, 96 TA: 10000 60000 8000000 7F00000 FE000 TB: 10020 60060 8000008 7F00040 FE060
Intel Collaborative Research Institute Computational Intelligence Differentials Behavior: Mathematical intuition Non uniform • Differential use is beneficial in cases of high redundancy. • Application distribution functions can provide the intuition on vector repetitions. • Non uniform CDFs imply highly regular patterns. • Uniform CDFs imply noisy patterns (differentials behavior cannot be exploited) Uniform
Intel Collaborative Research Institute Computational Intelligence Differentials Behavior: Mathematical intuition • Given N vectors, straightforward dictionary will be of size: R=log2(N) • Entropy H is a theoretical lower bound on representation, based on distribution: • Example – assuming 1000 vector instances with 4 possible values: R = 2. • Differential Entropy Compression Ratio (DECR) is used as repetition criteria:
Intel Collaborative Research Institute Computational Intelligence Possible differential application: cache line prefetching • First attempt: Prefix based predictor, given a differential prefix – predict suffix • Example: A and B finished running ( is stored) • Now C is running… TC: 7F27BD6DF8 1560DF0 6AF04C? 7F78BBC464? 61EA60? 61D8C0? TB: 7F27BD6DF8 DBFA10 6A1D0C 7F7835F23A 61E898 61DFD0 0, 8000480, 54080? 8770090? 456? -1808? TA: 7F27BD6DF8 61E630 6949CC 7F77B02010 61E6D0 61E6E0 0, 8000480, 54080, 8770090, 456, -1808
Intel Collaborative Research Institute Computational Intelligence Possible differential application: cache line prefetching • Second attempt: PHT predictor, based on the last X differentials – predict next differential. • Example: • 32 96 8 64 96 • 32 96 8 64 96 • 10 16 0 16 32 • 32 96 8 64 96 • 32 96 8 64 96 • 10 16 0 16 32 • 32 96 8 64 96 • 32 96 8 64 96
Intel Collaborative Research Institute Computational Intelligence Possible differential application: cache line prefetching • Prefix policy: Differential DB is a prefix tree, Prediction performed once differential prefix is unique. • PHT policy: Differential DB hold the history table, Prediction performed upon task start, based on history pattern:
Intel Collaborative Research Institute Computational Intelligence Possible differential application: cache line prefetching • Predictors compared with 2 models: Base (no prefetching) and Ideal (theoretical predictor – accurately predicts every repeating differential)
Intel Collaborative Research Institute Computational Intelligence Future work • Hybrid policies: which policy to use when? (PHT is better for complete vector repetitions, prefix is better for partial vector repetitions, i.e. suffixes) • Regular expression based policy (for pattern matching, beyond “ideal” model) • Predict other functional features using differentials (e.g. branch prediction, PTE prefetching etc.)
Intel Collaborative Research Institute Computational Intelligence Conclusions (so far…) • When we look at the data, patterns emerge… • Quite a large headroom for optimizing computer systems • Existing predictions are based on heuristics • A machine that does not respond within 1s is considered dead • Memory prefetchers look for blocked and strided accesses • Goal: Use ML, not heuristics, to uncover behavioral semantics