210 likes | 310 Views
Logistic Regression and Perceptron Prediction of Instruction Branches. Joshua Ferguson. Overview. Motivation Branch Prediction background Machine Learning background Methodology Results. Motivation.
E N D
Logistic Regression and Perceptron Prediction of Instruction Branches Joshua Ferguson
Overview Motivation Branch Prediction background Machine Learning background Methodology Results
Motivation CPUs account for around 30% of server power usage while idle, and that percentage scales up with utilization* Instruction Branch Misprediction causes unnecessary instruction execution on the CPU A simple experiment on an Intel M 1.6 GHz CPU found approximately 8% of branched instruction were mispredicted, even while idle. *Luiz André Barroso and Urs Hölzle - The Case for Energy-Proportional Computing, IEEE 2007
Branch Prediction Workload L3 L2 L2 L1 L1 Results … … … … … … Registers
Branch Prediction Cont… • If-Then statements throw this off • By default, the CPU will execute whichever branch it predicts will be executed • Common techniques involve a simple buffer of recent memory. • Others use limited pattern matchers T N T N T T T – Branch Taken N – Branch Not-Taken
Machine Learning The CPU is trying to learn patterns, so why not use modern machine learning techniques? Most scale poorly, especially at the constriction of resources that CPUs have. None-the-less, I wanted to try a few out.
Machine Learning cont… Logistic Regression Perceptron
Methodology Generate workload Trace CPU metrics Analyze and Rank ML algorithms
Methodology cont…Generate Workload • Jakart – Java based HTTP request suite. Runs scripts of HTTP requests. • Scripts aren’t very customizable, and would make patterns painfully obvious
Methodology cont…Generate Workload SpecPower – Perfect solution Provides interesting variation in CPU workload
Methodology cont…Trace CPU metrics • Intel – Vtune • Only provide graphs and summary data, no trace for research • Performance Profiling for Machine Learning • Abandoned project, only runs on Pentium 4s • AMD - Code Analyst • Only provides summary data, no trace
MethodologyTrace CPU metrics • Performance API • University of Tennessee Knoxville • Library of calls to Manufacturer Specific Registers that store information like: • # of branch instructions encountered • Branches mis-predicted • L1/L2 cache miss/hit/access
MethodologyTrace CPU metrics • Unfortunately, limited to the resolution of the hardware’s sleep counter. • Hundreds of branches would pass between each measurement. • Capabilities for any specific CPU can vary. Main.c pthread_t BRCN; structthread_argsBRCN_args; *BRCN_args.metric_type = PAPI_BR_CN; pthread_create(&BRCN,NULL,papi_thread,(void *)&BRCN_args); PAPI_thread.c PAPI_read_counters();
MethodologyTrace CPU Metrics Journal of Instruction-Level Parallelism hosts public traces with data values and memory addresses. Traces from Int and FPoperations, as well asWebServer workload
Analysis • Prepare data • Bitshifted instruction addresses, so only high-level info remains • Unsigned int • Whether each instruction is a branch, call, or return • Booleans • If it branches, the bitshifted target address. • Boolean and unsigned int
Analysis cont… • Train each algorithm on subset of data, and then test for error rate on main data file • Logistic Regression must train offline. • Trained on 10,000 samples. Tested on 40,000. • Perceptron can train online • Keeps running buffer of passed 100 values • Requires buffer size of (4*Boolean + 2*uint16)*100 • 3.6k
Analysis cont…Baselines • Running history buffer • Choose statistically likely outcome • If 25%, 50% or 75% history take branch, then branch • Previous outcome • If took last branch, then take, otherwise pass.
ResultsBaseline Floating Point Workload Integer Point Workload Error % Buffer History Length T N T N T T
ResultsLogistic Regression Integer Workload Trace Floating Point Workload Trace Error % Epsilon Value (Higher means more accurate match with training data)
ResultsPerceptron Flat 33.9% error rate using inventor’s algorithm (Rosenblatt) A disappointed result, especially for an online algorithm. No capability to really change how accurately it fits the training data, thus causing the model to lose generality.
Final Thoughts Obtaining solid CPU traces is commonly done in literature using AIX, an IBM proprietary OS. For research in this area, this OS seems a necessity. Implementing logistic regression in a low enough language to execute effectively is a challenge. SPECPower can be combined with PAPI to test higher level workload learners, possibly existing at the OS level and controlling ACPI states, rather than just branch prediction in the register. Thanks!