160 likes | 282 Views
Code Optimization. Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation. Agenda. Talk about possible exam ideas Code optimization techniques Not everyone has reconfigurable processors! Credits
E N D
Code Optimization Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation
Agenda • Talk about possible exam ideas • Code optimization techniques • Not everyone has reconfigurable processors! • Credits • Most of slides in this lecture are based on slides created by Profs. Raj Rajkumar and Professor Priya Narasimhan from ECE Dept at Carnegie Mellon
Code Optimization • Programmers can improve program performance by writing better code • Improve data structure and/or algorithms • Merge vs. bubble sorts • Reorganize code or provide flags to help compilers • Last option is to write in assembly
Better Algorithms • Merge vs. bubble sorts • Which one runs faster? • Which one causes more cache misses?
Common Optimization Techniques • Sub-expression elimination • Dead code elimination • Induction variables • Strength reduction • Loop unrolling • In-lining
Common Techniques (cont.) • Sub-expression elimination myfunction: index1 = 8 * i x = a [index1] temp = 8 * i index2 = 4 * j t = a[index2] a[temp] = t temp2 = 4 * j a[temp2] = x goto myfunction
Common Techniques (cont.) • Dead code elimination int i = 0; i = i + 1; if (i == 0) j = j * 8; else j = j * 10; use ASSERT and #ifdef to advice the compiler about deadcode
Common Techniques (cont.) • Induction variables and strength reduction i = 0 j = 0 label j = j + 1 i = 4 * j a[i * 2] = b [i] if (i < 1000) goto label
Optimization Techniques (cont.) • In-lining main: addi $s0, $t1, 0 addi $s1, $t2, 0 jal mult add $t3, $v0, 0 mult: addi $sp, $sp -12 sw $s1, 4($sp) sw $s0, 8($sp) sw $ra, 12($sp) sll $v0, $s0, $s1 lw $s1, 4($sp) lw $s0, 8($sp) lw $ra, 12($sp) addi $sp, $sp, 12 jr $ra What’s wrong with this picture?
Optimization Techniques (cont.) • Loop unrolling • Eliminate branches (why?)
Architecture Dependent Optimizations X = Y * 64 Convert 8-bit RGB to 8-bit YCC Y = 0.299R + 0.587G + 0.114B Cb = -0.169R - 0.331G + 0.500B + 128 Cr = 0.500R - 0.419G – 0.082B + 128
Architecture Dependent Optimizations (cont.) Address Register Addr Incrementer Incrementer Bus ALU Bus Register Bank Write Buffer (holds address and data) A Bus Barrel Shifter B Bus 32-bit ALU Mem Addr Register Write Data Register Read Data/Instr Reg Dout[31:0] Data[31:0] RAM
Summary • No magic bullet • optimizations sometimes don’t work • programmers need to help • various techniques that may require prior knowledge of the hardware