Code Compaction for UniCore on Link-Time Optimization Platform

Code Compaction for UniCoreon Link-Time Optimization Platform Zhang Jiyu Compilation Toolchain Group MPRC

Compilation Process

Our Optimization Process

CLOU is a Link-time Optimizer for UniCore Code Data Code Data Code Data Data Data Data Linking Code Code Code Data Data Data Meta Meta Translation to IR Meta CFG construction & Optimizations Exec Layout; Assembling A Graph Modified From Diablo

Code Compaction based on CLOU 1 2 3 • Motivation of code compaction • Limited memory and energy resources for embedded systems • Code density affects both memory and energy consumption • Goal: reducing code size without losing performance • Code compaction in different levels 1. Typical optimizations for code size reduction at link-time 2. Hot/cold code splitting 3. New mixed code generation method

Typical Optimizations for Code Size Reduction • Redundant code elimination • Computations whose results have been computed previously and are guaranteed to be available at that point • Unreachable code elimination • Code fragments which there is no control flow path to from the entry node • Many of them are following useless comparisons • Dead code elimination • Computations whose results are never used • Peephole optimization • Procedural abstraction -- might lead to performance loss

Experiments for Typical Optimizations for Code Size Reduction • Benchmark: Mediabench • Code size reduction • Average: 12.8% • Max: 22.3% • Performance improvement • Average: 2.4% • Max: 4.2%

Hot/Cold Code Splitting Code 3 2 1 Code Code Condition Condition Condition Hot Code Hot Code Hot Code Cold Code More Code Cold Code More Code More Code Cold Code • Less code transferred from remote to local, from disk to memory, or from memory to cache • Question: might be too conservative or lead to performance loss? • Get hot/cold code splitted through basic block reordering

Hot/Cold Code Splitting • PH: A popular greedy approach • Structural Analysis Based Basic Block Reordering • Most part of a program can be decomposed into several typical structures • Cost Module for each structure • Minimal-cost layout  Optimal layout for each local structure based on profiling information

Basic Block Reordering • Cost Model • Different kinds of control flow edges have different cost • For a specific order, • A list can be got for each structure f (structure, frequencies of all edges)  the best order of basic blocks for the local structure control flow edges

Experiments • Complexity: O(N*log N)，N: number of basic blocks • Experiment results (not using other link-time optimizations) • Normalized cycle counts Normalized cache miss rate

Mixed Code Generation • Dual-width Instruction Set • 32-bit ISA: more powerful • 16-bit ISA: more compact • Less coding space for operations • Less register field • Less immediate field 32-bit: add r0, r0, 0xff800000 16-bit: str r2, [addr] mov r2, 0xff lsl r2, #1 add r2, #1 lsl r2, 24 add r0, r2 ld r2, [addr]

Mixed Code Generation • Related works in dual-width Instruction Set design and mixed code generation • Coarse-grained function-level mixed code generation • By BX in arm and JALX in MIPS • Simple fine-grained instruction-level mixed code generation • By BX in arm and JALX in MIPS • By single specific mode-changing instruction • Specialized coding • One-leading instruction word indicates one 32-bit instruction; Zero-leading instruction word indicates two 16-bit instruction. • 16-bit ISA extensions • Problem: Always lead to performance loss

Potential benefit • Analysis of Programs in Mediabench 27851 different instructions in all programs: • Log(27851)=15 1 2

Two-operand instructions mov rd, rm or short immediate cmp rn, rm or short immediate Branch/Jump Distribution of immediate-offsets of branch instructions. Two Main Kinds of Frequent Instructions

The Idea of Mode-Changing Instruction Set (MC) • Extend the 32-bit ISA to add a small MC Instruction Set (using the reserved coding space) • Change the CPU mode • Perform its own normal operation • Scan for suitable 32-bit instructions to be encoded into 16-bit instructions • A mixed code fraction with MCinstructions

Mixed code execution in Unicore-I pipeline Improved mixed code executionin Unicore-I pipeline Modification to Micro Architecture • No extra cycles • One more 16-bit instruction-fetch buffer • An MC-decoder

Mixed Code Generation Instruction Analyzer program Link-Time Optimizer program program program Mixed coded Program Mode -Changing Instructions Simulator

Experiment Results • Normalized code size (results not using other link-time optimizations)

Conclusion • Code compaction on Link-Time Optimization Platform • Compiler optimizations applied at link time • Typical optimizations for code size reduction • Program layout optimization • Hot/cold code splitting through basic block reordering • Machine code generation • Mixed code generation • Experiment Results • Average code size reduction: 32.9% • Average performance improvement: 9.1%

Thank you

Instruction Analysis Instruction format type classifications

Normalized dynamic instruction numbers Normalized cycle counts EXPERIMENT RESULTS

Code Compaction for UniCore on Link-Time Optimization Platform

Code Compaction for UniCore on Link-Time Optimization Platform

Presentation Transcript

Code Optimization

Code Optimization

Generic Code Optimization

C66x Code Optimization

Automatic Compaction of OS Kernel Code via On-Demand Code Loading

More Code Optimization

Code Optimization

Code Optimization

Sparse code optimization

Code Optimization

More Code Optimization

Code Optimization

roku link code

Steps For Roku link code

Code Optimization

Generic Code Optimization

Code Optimization

Roku Link Code Not Showing Up - roku activation | roku link code | link code activation

roku stick & tv link code www.roku.com/link link code activation

Enter Link Code | Roku Com Link | Link Activation Code

Enter Link Code