Katherine E. Coons,

Feature Selection and Policy Optimization for Distributed Instruction Placement Using Reinforcement Learning Katherine E. Coons, Behnam Robatmili, Matthew E. Taylor, Bertrand A. Maher, Doug Burger, Kathryn S. McKinley

Motivation • Programmer time is expensive • Time-to-market is short • Compiler is a key component for performance • Performance depends on hard-to-tune heuristics • Function inlining, hyperblock formation, loop unrolling, instruction scheduling, register allocation Machine learning can help

Machine Learning for Compilers • Learning to schedule (NIPS ‘97, PLDI ‘04) • Meta Optimization (PLDI ‘03) • Automatically tuning inlining heuristics (Supercomputing ‘05) • Predicting unroll factors (CGO ‘05) • Machine learning for iterative optimization (CGO ‘06) Focus on feature selection, learn something about the problem

NEAT Group blocks Feature Selection Specialized solutions Clustering Initial feature set Reduced feature set Lasso regression General solutions Correlation Classifier solutions Classification Data mining Reinforcement learning Unsupervised and supervised learning Overview

R1 R2 Legend Compiling for TRIPS Block Control Flow Graph Dataflow Graph Execution Substrate HB1 R2 Source Code mul R1 R2 R1 add mul add mul add add HB2 HB3 mul add add add mul add R1 mul add HB4 R1 Register Data cache Execution Control

Legend TRIPS Scheduling Overview R2 add mul br ld ld Static Placement, Dynamic Issue D0 ctrl D1 R1 ctrl R2 R1 R1 mul D0 Scheduler add mul Dataflow Graph add D1 ld mul add W1 br R1 ctrl R2 R1 D0 Placement D1 Register Data cache Execution Control Topology 128! Scheduling possibilities

Schedule (block, topology) { initialize known anchor points while (not all instructions scheduled) { for (each instruction in open list, i) { for (each available location, n) { calculate placement cost for (i, n) keep track of n with min placement cost } keep track of i with highest min placement cost } schedule i with highest min placement cost } } calculate placement cost for (i, n) Placement cost + * * * * * * 0.7 1.6 0.2 -.3 1.1 0.4 0.2 -.2 0.1 1.0 1.1 0.9 Legend Output node Hidden node Input node Features Spatial Path Scheduling Function approximator

Feature Selection • Features important for reinforcement learning • Implemented 64 features • Loop features (nesting depth) • Block features (fullness) • Instruction features (latency) • Tile features (row) • Instruction/tile features (critical path length) • Reduced feature set size • Correlation • Lasso regression

Feature Selection via the Lasso • Goal: Rank features by effect on performance when used in placement cost • Feature coefficients as performance predictors • Dimensionality reduction • Subset of variables that exhibits strongest effects • Forces lasso coefficients to zero

Dataflow Graph Placement Cost = 1.7 R2 mul add add R1 mul add Calculate features Coefficients R1 Topology 0.7 1.6 Critical path length Latency Link utilization Tile utilization Max resource usage Local inputs Remote siblings n = number of features i = instruction being placed l = location under consideration PC(i,l) = Placement cost for i at l FVk = kth Feature Value 0.2 -0.3 0.4 1.1 0.2 0.2 0.1 -1.0 0.9 1.1 0.6 0.2 Lasso Input Data Generation R1 R2

Tile number Local inputs Criticality Remote inputs Link utilization Remote siblings Loop-carried dep. Critical path length Is load … Prioritized Features Single data point: coeff0 coeff1 Speedup: 1.0 Speedup: 0.7 coeff2 coeff3 coeff1 Speedup: 0.7 Speedup: 0.9 coeff4 coeff5 Speedup: 0.9 Speedup: 0.9 Speedup: 0.8 coeff6 coeff7 Speedup: 0.6 Speedup: 1.0 coeff7 coeff9 Speedup: 1.1 Speedup: 0.8 … Feature Prioritization

Feature Selection Overview Initial features 64 Prioritized features 64 Pruned features 52 Final feature set 11 Lasso regression Prune correlated features Prune based on lasso priority

Legend Output node Hidden node Input node Add node mutation Add link mutation NEAT • Genetic algorithm that uses neural networks • Modifies topology of network as well as weights • Standard crossover, mutation operators • “Complexification” operators

Why NEAT? • Popular, publicly available, well-supported • Nine different implementations • Active user group of about 350 • Domain-independent • Large search spaces tractable • Complexification reduces training time • Inherently favors parsimony • Relatively little parameter tuning required • Solutions are reusable

Input node Hidden node Output node Training NEAT Schedule using each network Evolve population Run program Crossover Assign fitnesses (geomean of speedup) Add link Mutation Add node Legend

Legend Output node Hidden node Input node 0.1 0.5 0.5 0.8 0.7 -1.2 -1.1 1.7 0.7 -4.1 0.9 Is load Is store Local inputs Criticality Tile utilization Remote siblings Critical path length Loop-carried dependence Example Network Placement Cost Features:

Grouping Blocks • Different blocks may require different placement features/heuristics • 12% speedup with specialized heuristics • Less than 1% speedup with general heuristics • Choose heuristic based on block characteristics • Cluster blocks that perform well with same networks • Classify based on block characteristics • Learn different solutions for different groups

Experimental Setup • All tests performed on TRIPS prototype system • Fitness: Geomean of speedup in cycles • 64 features before feature selection, 11 after • Population size = 264 networks • 100 generations per NEAT run • Compared with simulated annealing scheduler • 47 small benchmarks • SPEC2000 kernels • EEMBC benchmarks • Signal processing kernels from GMTI radar suite • Vector add, fast fourier transform

Feature Selection Results Training across four benchmarks with initial and lasso features

2% improvement 8% improvement Simulated Annealing vs. NEAT Speedup over programmer-designed heuristic for 47 specialized solutions Geomean of speedup

General Solutions and Classification • General solution across all 47 benchmarks • Geomean of speedup = 1.00 after 100 generations • Required approximately one month • Classification • Three classes, trained two • Geomean of speedup = 1.03 after 4 generations • Required approximately two days • New benchmarks see little speedup

Conclusions • Feature selection is important • Incorporate performance metrics • NEAT useful for optimizing compiler heuristics • Well supported, little parameter tuning • Very useful for specialized solutions • More work needed to find good general solutions • Open questions • What can learned heuristics teach us? • Can we simultaneously learn different heuristics? • How can we learn better general heuristics?

Questions?

Katherine E. Coons,

Katherine E. Coons,

Presentation Transcript

Katherine Burdick’s

Katherine Saffelle

Katherine Baicker

Katherine Kennedy

Katherine Paterson

Katherine Mansfield

Katherine Bluff

Katherine Millard

Katherine Sorsdahl

Roberta Coons Dee Price Sanders

Katherine Mansfield

Katherine bigelow

Katherine Chen

Katherine Paterson

Hosted By Michelle Coons

Katherine Paterson

Katherine Mondadori

Katherine Mondadori

Coons Patches and Gregory Patches

Dreamz Coons Kitten