1 / 24

Code Compaction for UniCore on Link-Time Optimization Platform

Code Compaction for UniCore on Link-Time Optimization Platform. Zhang Jiyu Compilation Toolchain Group MPRC. Compilation Process. Our Optimization Process. CLOU is a Link-time Optimizer for UniCore. Code. Data. Code. Data. Code. Data. Data. Data. Data. Linking. Code. Code. Code.

dalton
Download Presentation

Code Compaction for UniCore on Link-Time Optimization Platform

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Code Compaction for UniCoreon Link-Time Optimization Platform Zhang Jiyu Compilation Toolchain Group MPRC

  2. Compilation Process

  3. Our Optimization Process

  4. CLOU is a Link-time Optimizer for UniCore Code Data Code Data Code Data Data Data Data Linking Code Code Code Data Data Data Meta Meta Translation to IR Meta CFG construction & Optimizations Exec Layout; Assembling A Graph Modified From Diablo

  5. Code Compaction based on CLOU 1 2 3 • Motivation of code compaction • Limited memory and energy resources for embedded systems • Code density affects both memory and energy consumption • Goal: reducing code size without losing performance • Code compaction in different levels 1. Typical optimizations for code size reduction at link-time 2. Hot/cold code splitting 3. New mixed code generation method

  6. Typical Optimizations for Code Size Reduction • Redundant code elimination • Computations whose results have been computed previously and are guaranteed to be available at that point • Unreachable code elimination • Code fragments which there is no control flow path to from the entry node • Many of them are following useless comparisons • Dead code elimination • Computations whose results are never used • Peephole optimization • Procedural abstraction -- might lead to performance loss

  7. Experiments for Typical Optimizations for Code Size Reduction • Benchmark: Mediabench • Code size reduction • Average: 12.8% • Max: 22.3% • Performance improvement • Average: 2.4% • Max: 4.2%

  8. Hot/Cold Code Splitting Code 3 2 1 Code Code Condition Condition Condition Hot Code Hot Code Hot Code Cold Code More Code Cold Code More Code More Code Cold Code • Less code transferred from remote to local, from disk to memory, or from memory to cache • Question: might be too conservative or lead to performance loss? • Get hot/cold code splitted through basic block reordering

  9. Hot/Cold Code Splitting • PH: A popular greedy approach • Structural Analysis Based Basic Block Reordering • Most part of a program can be decomposed into several typical structures • Cost Module for each structure • Minimal-cost layout  Optimal layout for each local structure based on profiling information

  10. Basic Block Reordering • Cost Model • Different kinds of control flow edges have different cost • For a specific order, • A list can be got for each structure f (structure, frequencies of all edges)  the best order of basic blocks for the local structure control flow edges

  11. Experiments • Complexity: O(N*log N),N: number of basic blocks • Experiment results (not using other link-time optimizations) • Normalized cycle counts Normalized cache miss rate

  12. Mixed Code Generation • Dual-width Instruction Set • 32-bit ISA: more powerful • 16-bit ISA: more compact • Less coding space for operations • Less register field • Less immediate field 32-bit: add r0, r0, 0xff800000 16-bit: str r2, [addr] mov r2, 0xff lsl r2, #1 add r2, #1 lsl r2, 24 add r0, r2 ld r2, [addr]

  13. Mixed Code Generation • Related works in dual-width Instruction Set design and mixed code generation • Coarse-grained function-level mixed code generation • By BX in arm and JALX in MIPS • Simple fine-grained instruction-level mixed code generation • By BX in arm and JALX in MIPS • By single specific mode-changing instruction • Specialized coding • One-leading instruction word indicates one 32-bit instruction; Zero-leading instruction word indicates two 16-bit instruction. • 16-bit ISA extensions • Problem: Always lead to performance loss

  14. Potential benefit • Analysis of Programs in Mediabench 27851 different instructions in all programs: • Log(27851)=15 1 2

  15. Two-operand instructions mov rd, rm or short immediate cmp rn, rm or short immediate Branch/Jump Distribution of immediate-offsets of branch instructions. Two Main Kinds of Frequent Instructions

  16. The Idea of Mode-Changing Instruction Set (MC) • Extend the 32-bit ISA to add a small MC Instruction Set (using the reserved coding space) • Change the CPU mode • Perform its own normal operation • Scan for suitable 32-bit instructions to be encoded into 16-bit instructions • A mixed code fraction with MCinstructions

  17. Mixed code execution in Unicore-I pipeline Improved mixed code executionin Unicore-I pipeline Modification to Micro Architecture • No extra cycles • One more 16-bit instruction-fetch buffer • An MC-decoder

  18. Mixed Code Generation Instruction Analyzer program Link-Time Optimizer program program program Mixed coded Program Mode -Changing Instructions Simulator

  19. Experiment Results • Normalized code size (results not using other link-time optimizations)

  20. Conclusion • Code compaction on Link-Time Optimization Platform • Compiler optimizations applied at link time • Typical optimizations for code size reduction • Program layout optimization • Hot/cold code splitting through basic block reordering • Machine code generation • Mixed code generation • Experiment Results • Average code size reduction: 32.9% • Average performance improvement: 9.1%

  21. Thank you

  22. Instruction Analysis Instruction format type classifications

  23. Normalized dynamic instruction numbers Normalized cycle counts EXPERIMENT RESULTS

More Related