260 likes | 330 Views
Learning A Better Compiler. Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning. Predicting Unroll Factors. Loop Unrolling sensitive to unroll factor Current solution: expert design
E N D
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning
Predicting Unroll Factors • Loop Unrolling sensitive to unroll factor • Current solution: expert design • Difficult: Hand-tuned heuristics • Must be rewritten frequently • Predict parameters with machine learning • Easy: data collection takes ~1wk • No human time • Algorithm does not change with compiler
Loop Unrolling • Combines multiple iterations loop body • Fewer Iterations Less Branching • Allows other transformations: • Exposes adjacent memory locations • Allows instruction reordering across iterations
Unroll Factors • How many iterations to combine? • Too few? • Provides little benefit • Too large • Increased cache pressure • Increase live rangeregister pressure
Classification Problems • Input a vector of features • E.g. nest depth, # of branches, # of ops • Output a class • E.g. unroll factor, 1-8 • No prior knowledge required • Meaning of features/classes • Relevance of features • Relationships between features
Nearest Neighbors • Paper describes Kernel Density Estimator • All dimensions normalized to [0,1] • Given a test point p: • Consider training points “close” to p • Within fixed distance, e.g. 0.3 • Majority vote among qualifying training points
Support Vector Machine • Assume two classes, easily generalized • Transform data • Make classes linearly separable • Find line to maximize sep. margin • For test point: • Perform transformation • Classify based on learned line
# operands Live range size Critical path length # operations Known tripcount # floating point ops Loop nest level # branches # memory ops Instruction fan-in in DAG # instructions Language: C, fortran # memory ops # Implicit instructions & more (38 total) Some Features
Big Idea: Easy Maintenance • Performance improvements modest • Sometimes worse, sometimes much better • Usually little change • Requires no re-tuning to change compiler • Gathering data takes ~1wk, no human time • General mechanism • Can be applied to all parameters • No model of system needed • Can be applied to new transformations where expert knowledge is unavailable
Integrated CPU and L2 Cache Voltage Scaling using Machine Learning
Dynamic Voltage Control • Monitor system • When activity is low, reduce power • Also reduces computational capacity • May need more energy if work takes longer
Multiple Clock Domains • Adjust separate components independently • Better performance/power • E.g. CPU-bound application may be able to decrease power to memory and cache without affecting performance • More complex DVM policy
Motivation • Applications go through phases • Frequency/voltages should change too • Focus on core, L2 cache • Consume large fraction of total power • Best policy may change over time • On battery: conserve power • Plugged in: maximize performance
Learning a DVM Policy • Compiler automatically instruments code • Insert sampling code to record perf. Counters • Instrument code only to gather data • Use machine learning to create policy • Implement policy in microcontroller
ML Parameters • Features • Clock cycles per instruction • L2 accesses per instruction • Memory access per instruction • Select voltage to minimize: • Total energy • Energy*delay
Machine Learning Algorithm • Automatically learn set of if-then rules • E.g: If (L2PI >= 1) and (CPI <=0) then f_cache=1GHz • Compact, expressive • Can be implemented in hardware
Results • Compared to independently managing core and L2: • Saves 22% on average, 46% max • Learns effective rules from few features • Compiler modifications instrument code • Learned policy offline • Implemented policy in microcontroller
Conclusion • Machine learning derives models from data automatically • Allows easy maintenance of heuristics • Creates models that are more effective than hand-tuned