Logistic Regression and Perceptron Prediction of Instruction Branches

Logistic Regression and Perceptron Prediction of Instruction Branches Joshua Ferguson

Overview Motivation Branch Prediction background Machine Learning background Methodology Results

Motivation CPUs account for around 30% of server power usage while idle, and that percentage scales up with utilization* Instruction Branch Misprediction causes unnecessary instruction execution on the CPU A simple experiment on an Intel M 1.6 GHz CPU found approximately 8% of branched instruction were mispredicted, even while idle. *Luiz André Barroso and Urs Hölzle - The Case for Energy-Proportional Computing, IEEE 2007

Branch Prediction Workload L3 L2 L2 L1 L1 Results … … … … … … Registers

Branch Prediction Cont… • If-Then statements throw this off • By default, the CPU will execute whichever branch it predicts will be executed • Common techniques involve a simple buffer of recent memory. • Others use limited pattern matchers T N T N T T T – Branch Taken N – Branch Not-Taken

Machine Learning The CPU is trying to learn patterns, so why not use modern machine learning techniques? Most scale poorly, especially at the constriction of resources that CPUs have. None-the-less, I wanted to try a few out.

Machine Learning cont… Logistic Regression Perceptron

Methodology Generate workload Trace CPU metrics Analyze and Rank ML algorithms

Methodology cont…Generate Workload • Jakart – Java based HTTP request suite. Runs scripts of HTTP requests. • Scripts aren’t very customizable, and would make patterns painfully obvious

Methodology cont…Generate Workload SpecPower – Perfect solution Provides interesting variation in CPU workload

Methodology cont…Trace CPU metrics • Intel – Vtune • Only provide graphs and summary data, no trace for research • Performance Profiling for Machine Learning • Abandoned project, only runs on Pentium 4s • AMD - Code Analyst • Only provides summary data, no trace

MethodologyTrace CPU metrics • Performance API • University of Tennessee Knoxville • Library of calls to Manufacturer Specific Registers that store information like: • # of branch instructions encountered • Branches mis-predicted • L1/L2 cache miss/hit/access

MethodologyTrace CPU metrics • Unfortunately, limited to the resolution of the hardware’s sleep counter. • Hundreds of branches would pass between each measurement. • Capabilities for any specific CPU can vary. Main.c pthread_t BRCN; structthread_argsBRCN_args; *BRCN_args.metric_type = PAPI_BR_CN; pthread_create(&BRCN,NULL,papi_thread,(void *)&BRCN_args); PAPI_thread.c PAPI_read_counters();

MethodologyTrace CPU Metrics Journal of Instruction-Level Parallelism hosts public traces with data values and memory addresses. Traces from Int and FPoperations, as well asWebServer workload

Analysis • Prepare data • Bitshifted instruction addresses, so only high-level info remains • Unsigned int • Whether each instruction is a branch, call, or return • Booleans • If it branches, the bitshifted target address. • Boolean and unsigned int

Analysis cont… • Train each algorithm on subset of data, and then test for error rate on main data file • Logistic Regression must train offline. • Trained on 10,000 samples. Tested on 40,000. • Perceptron can train online • Keeps running buffer of passed 100 values • Requires buffer size of (4*Boolean + 2*uint16)*100 • 3.6k

Analysis cont…Baselines • Running history buffer • Choose statistically likely outcome • If 25%, 50% or 75% history take branch, then branch • Previous outcome • If took last branch, then take, otherwise pass.

ResultsBaseline Floating Point Workload Integer Point Workload Error % Buffer History Length T N T N T T

ResultsLogistic Regression Integer Workload Trace Floating Point Workload Trace Error % Epsilon Value (Higher means more accurate match with training data)

ResultsPerceptron Flat 33.9% error rate using inventor’s algorithm (Rosenblatt) A disappointed result, especially for an online algorithm. No capability to really change how accurately it fits the training data, thus causing the model to lose generality.

Final Thoughts Obtaining solid CPU traces is commonly done in literature using AIX, an IBM proprietary OS. For research in this area, this OS seems a necessity. Implementing logistic regression in a low enough language to execute effectively is a challenge. SPECPower can be combined with PAPI to test higher level workload learners, possibly existing at the OS level and controlling ACPI states, rather than just branch prediction in the register. Thanks!

Logistic Regression and Perceptron Prediction of Instruction Branches

Logistic Regression and Perceptron Prediction of Instruction Branches

Presentation Transcript

Logistic regression

Logistic Regression

Logistic Regression

Review of Regression and Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic regression

Logistic Regression

Logistic Regression

Logistic Regression