A Machine Learning Approach for Thread Mapping on Transactional Memory Applications

A Machine Learning Approach for Thread Mapping on Transactional Memory Applications Source: 18th International Conference on High Performance Computing Authors:Castro, M.; Goes, L.F.W.; Ribeiro, C.P.; Cole, M.; Cintra, M.; Mehaut, J. 100062512 張光瑜, 100065801 談得聖

Outline • Introduction • Software Transactional Memory (STM) • Thread Mapping • Machine Learning • The ID3 Algorithm • Result & Conclusion & Future Works

Part 1 Introduction

Transaction • a sequence of instructions that performs a single logical function. • The concept originally used in database systems.

Software Transactional Memory (STM) • Programmer can write the code as transactions. • Use the STM libraries to guarantee each transaction is executed atomically and in isolation regardless of eventual data races.

Cache Memory • When the data in the main memory is used, it’s copied into the cache. • From the application perspective, it can be viewed as a way to share data efficiently.

Thread Mapping • For example, you can put threadsthat communicate often on cores that share some level of cache. • By doing so, the high latency to access the main memory can be avoided.

Thread Mapping (cont’d) • Many strategies exist for mapping. • However, there is no single one that provides good performance for all different applications and platforms.

The Best & Worst Strategies for each Case

The Goal • Given an application, predict which mapping strategy is the best.

Part 2 Machine Learning

Machine Learning • Static phase: • The goal is to build up a predictor. • Here, the predictor is a decision tree. • Dynamic phase: • Use the predictor to decide which mapping strategy is going to be used.

Machine Learning (cont’d)

Decision Tree

Static Phase • Three steps: • Application profiling • Data pre-processing • Learning process

Application Profiling • Features: • Category A: the interaction between the application and the STM system. • Transaction time ratio • Abort ratio • Category B: STM mechanisms • Conflict detection: eager, lazy • Resolution strategy: suicide, backoff

Application Profiling (cont’d) • Features: • Category C: the interaction between the application and the platform • Last Level Cache Miss Ratio • Target Variable T: thread mapping strategies • Linux, Compact, Scatter, Round-Robin

Different Thread Mapping Strategies

Data Pre-Processing • Since we are building a decision tree, the features must be categorical or discrete. • Features in categories A & C are converted into: • Low (0.0; 0.33) • Medium (0.33; 0.66) • High (0.66l 1.0)

Part 3 The ID3 Algorithm

Using Game-Based Cooperative Learning to Improve Learning Motivation: A Study of Online Game Use in an Operating Systems Course IEEE TRANSACTIONS ON EDUCATION, VOL. 56, NO. 2, MAY 2013

Decision Tree

ID3 (Iterative Dichotomiser 3) • Quinlan(1979) • Base on Shannon(1949)的Information theory

ID3 (Iterative Dichotomiser 3) • Information theory:若一事件有k種結果,對應的機率為Pi。則此事件發生後所得到的資訊(Entropy)為： • Example 1: 設 k=4 p1=0.25,p2=0.25,p3=0.25,p4=0.25I=-(.25*log2(.25)*4)=2 • Example 2: 設 k=4 p1=0, p2=0.5, p3=0, p4=0.5I=-(.5*log2(.5)*2)=1

ID3 (Iterative Dichotomiser 3) • Calculate the entropy of every attribute using the data set • Split the set into subsets using the attribute for which entropy is minimum (or, equivalently, information gain is maximum) • Make a decision tree node containing that attribute • Recurse on subsets using remaining attributes

Stop condition: • 如果該群資料的每一筆資料都已經歸類到同一類別。 • 該群資料已經沒有辦法再找到新的屬性來進行節點分割。 • 該群資料已經沒有任何尚未處理的資料。

Decision Tree

CrossValidation • Leave-one-out • The accuracies on SMP-24 and SMP-16 were 86% and 72%.

Prediction • The dynamic phase. • Three steps: • The application starts running with default thread mapping scheduling and is profiled during a initial warm-up interval. • Then use the profiled data to decide a mapping strategy. • Change the mapping strategy.

Part 4 Result & Conclusion & Future Works

Average Speedup

Result & Conclusion • Improve 11.35% and 18.46% compared to the worst case and 3.21% and 6.37% over Linux strategy. • ML-based approach is within 1% of the oracle performance.

Future work • Increase features to make more accuracies. • Automatically executed in an existing STM system. • Other algorithms like Neural networks or Support Vector Machines.

Thanks for your attention.

A Machine Learning Approach for Thread Mapping on Transactional Memory Applications

A Machine Learning Approach for Thread Mapping on Transactional Memory Applications

Presentation Transcript

Transactional memory

Thread-Safe Dynamic Binary Translation using Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory

Enabling Thread Level Speculation via A Transactional Memory System

Thread-Safe Dynamic Binary Translation using Transactional Memory

Transactional Memory

A Scalable, Non-blocking Approach to Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory

Thread-Level Transactional Memory Decoupling Interface and Implementation

Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory

Thread-Level Transactional Memory Decoupling Interface and Implementation

Transactional Memory

Transactional Memory