100 likes | 125 Views
Explore the intricacies of the Philips TriMedia VLIW architecture, featuring block diagrams, memory interfaces, coprocessors, and more. Learn about compiler techniques, scheduling, and performance metrics. Dive into VLIW scheduling, instruction cache, execution units, and issue slots. Uncover the characteristics and functionalities of this advanced architecture.
E N D
The Philips TriMedia A VLIW Architecture By Jurjen Westra
TM-1 Block Diagram SDRAM Main Memory Interface Image Coprocessor Video In VLD Coprocessor Audio In Video Out Audio Out Timers I2C Interface Sync Serial Interface VLIW CPU 32K I$ 16K D$ PCI interface TM has 128 general purpose 32 bit Registers
VLIW means relying on compiler techniques Only Cache-misses are run-time handled Compiler • Scheduling / Instruction Level Parallelism • Operation guarding • Speculation • Profiling for recompiling • Grafting (loop unrolling) • Alias analysis
Traditional Scheduling VLIW Scheduling A B C D B C A D A B C D C B A D
Instruction Cache Issue Slot 1 Issue Slot 2 Issue Slot 3 Issue Slot 4 Issue Slot 5 Execution Unit 1 Execution Unit 2 Execution Unit 27 But not all Issue Slots have access to all (types of) Execution Units!
Issue slot latency 1 2 3 4 5 CONST x x x x x ALU x x x x x SHIFTER x x FALU 3 x x DSPALU 2 x x DSPMUL 3 x x BRANCH 3 x x x IFMUL 3 x x FCOMP x DMEM 3 x x DMEMSPEC 3 x FTOUGH 17/16 x
Guarding C-code If(R2>R3) R4=R4+R5; Else R4=R4+R6; Assembly igtr R7 R2 R3 add R4 R4 R6 … … IF R7 add R4 R4 R5 … … … ...
Characteristics (1) • Custom Ops => loss of VLIW-character • Big or Little Endian • R0 and R1 have values 0 and 1 respectively • Geen Integer-Status-Flags but case-specific bit-patterns • 32 Interrupt-vectors • Interrupts are delayed
Characteristics (2) • 11 cycle read-miss-penalty • 3 cycle write-miss-penalty • Functional units require 1 cycle recovery time • Byte-addressable; 8-, 16- and 32-bit Loads and Stores • Register File supports up to 5 Writes per cycle (Latency) • Register File supports up to 15 Reads per cycle • Paging (64 bytes) • Instruction Length: 2-23 bytes; compressed
Example: MPEG-2 decoder • DVD-batman bitstream (4-9 Mbits/s) • 7 % Instruction-cache misses • 27% Data-cache misses • CPI (clock cycles/VLIW instruction): 1.37 • Total performance: 2,9 ops/clock