480 likes | 769 Views
VARD. ER. SUP. HAR. Arc. hit. ect. ure. SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture. Nagendra Doddapaneni. Overview . Harvard Architecture Super Harvard Architecture TigerSHARC processor. Outline. Background Harvard Architecture Why? What? Modern CPU Chip Design
E N D
VARD ER SUP HAR Arc hit ect ure SHARC‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni
Overview • Harvard Architecture • Super Harvard Architecture • TigerSHARC processor
Outline • Background • Harvard Architecture • Why? • What? • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor
Outline • Background <- • Harvard Architecture • Why? • What? • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor
Background • von Neumann Architecture • Single storage for instructions and data • Digital Signal Processors • Specialized microprocessor designed specifically for digital signal processing, generally in real time
Outline • Background • Harvard Architecture • Why? <- • What? • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor
Why Harvard Architecture ? • von Neumann bottleneck (‘memory bound’) • DSP applications • In von Neumann architecture • Either reading an instruction • Or reading/writing from/to memory
Outline • Background • Harvard Architecture • Why? • What? <- • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor
What is Harvard Architecture ? • Physically separate storage and signal pathways for instruction and data • Next instruction fetched, when executing current instruction • Program memory can be small and wide • Data memory can be large and narrower
Outline • Background • Harvard Architecture • Why? • What? • Modern CPU Chip Design <- • Super Harvard Architecture • TigerSHARC Processor
Modern CPU chip design • Incorporate features from both architectures • ‘On chip’ cache memory – divided into instruction cache and data cache. Harvard architecture used when CPU accesses cache memory. • On a cache miss, ‘off chip’ main memory is accessed using von Neumann architecture. Main memory is not separated into data and instruction sections.
Outline • Background • Harvard Architecture • Why? • What? • Modern CPU Chip Design • Super Harvard Architecture <- • TigerSHARC Processor
Super Harvard Architecture • Cache used to store instructions, leaving both instruction bus and data bus free to fetch operands • Harvard Architecture + cache = Extended Harvard Architecture or Super Harvard Architecture
Outline • Background • Harvard Architecture • Why? • What? • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor <-
TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications
TigerSHARC Processor • Processor Architecture <- • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications
TigerSHARC Processor Architecture • 3 128-bit data buses • 2 IALU’s • 2 Computational Blocks • ALU ( Float and Integer ) • SHIFTER • MULTIPLIER • CLU
TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation <- • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications
TigerSHARCInstruction Parallelism and SIMD Operation • Core can execute simultaneously one to four 32-bit instructions encoded in single instruction line (VLIW). • Can execute in parallel? Depends on…. • Instruction line resources each requires • Source and Destination of registers used • Supports SIMD operations through the use of both Computational Blocks in parallel. • Each Computational Block can execute four 16-bit or eith 8-bit SIMD computations in parallel.
TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU <- • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications
TigerSHARCInteger ALU • 31 32 bit general registers + 1 status register + 8 dedicated registers for circular buffers • Performs integer ALU operations and data addressing • ALU instructions: ADD, SUB, ARS, LRS (right shifts only), ROT (left and right), AND NOT, NOT, OR, XOR, ABS, MIN, MAX, CMP • Status flags: zero (Z), negative (N), overflow (V), carry (C) • Instruction conditions: EQ, LT, LE, NEQ, NLT, NLE • Instruction options: unsigned (U), circular buffer (CB), bit reverse (BR), computed jump (CJMP) • Address related operations: data address generation, circular buffers, bit reverse, UREG moves, DAB control.
TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File <- • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K Buses • DMA Controller • Applications
TigerSHARC Computational BlocksX and Y Register File • Register File Syntax • Each Block has 32x32 bit Data registers • Each register can store 4x8 bit, 2x16 bit or 1x32 bit words. • Registers can be combined into dual or quad groups. These groups can store 8, 16, 32, 40 or 64 bit words.
Register File Syntax TigerSHARC Computational BlocksX and Y Register File
Volatile registers in each block • 24 Volatile Data registers in each block • XR0 – XR23 • YR0 – YR23 • 2 ALU summation registers in each block • XPR0, XPR1, YPR0, YPR1 • 5 MAC accumulate registers in each block • XMR0 – XMR3, YMR0 – YMR3 • XMR4, YMR4 – Overflow registers
TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU <- • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications
TigerSHARC X and Y ALU • 2x64 bit input paths • 2x64 bit output paths • 8, 16, 32, or 64 bit addition/subtraction - Fixed-point • 32 or 64 bit logical operations - fixed-point • 32 or 40 bit floating-point operations
Sample ALU Instruction • Example of 16 bit addition • XYSR1:0 = R31:30 + R25:24 • Performs addition in X and Y Compute Blocks
TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier <- • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications
TigerSHARC Multiplier • Operates on fixed, floating and complex numbers. • Fixed-Point numbers • 32x32 bit with 32 or 64 bit results • 4 (16x16 bit) with 4x16 or 4x32 bit results • Floating-Point numbers • 32x32 bit with 32 bit result • 40x40 bit with 40 bit result • Complex Numbers • 32x32 bit with 32 bit result • Fixed-point only • Results stored in MR register
TigerSHARC Multiplier XR0 = R1*R2;; XR1:0 = R3*R5;; XMR1:0 = R3*R5;; //uses XMR4 overflow XR2 = MR3:2, XMR3:2 = R3*R5;; XR3:2 = MR1:0, XMR1:0 = R3*R5;; XFR0 = R1*R2;; XFR1:0 = R3:2*R5:4;; //40 bit multiply //32 bit mantissa
TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter <- • CLU • Program Sequencer • I J and K data buses • DMA Controller • Applications
TigerSHARCShifter • Operates on one 64-bit, one or two 32-bit, two or four 16-bit, and four or eight 8-bit fixed-point operands • Shifts and rotates bits • manipulation operations, like bit set, clear, toggle and test • Bit FIFO operations to support bit streams
TigerSHARC Processor • Processor Architecture • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU <- • Program Sequencer • J and K data buses • I bus – data bus
TigerSHARC CLU • CLU instructions are designed to support different algorithms used for communications applications • Algorithms supported are • Viterbi Decoding (minimal distance decoding algorithm) • Turbo-code Decoding (variant of Viterbi decoding) • De-spreading for Code Division Multiple Access (CDMA) systems (used for tasking a signal in wide Pseudo Noise spread bandwidth)
TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer <- • I J and K buses • DMA Controller • Applications
TigerSHARC Program Sequencer • Supplies instruction addresses to memory • IAB caches up to five fetched instruction lines waiting to execute • It extracts an instruction line from IAB and distributes to appropriate core component for execution • Determine flow control for instructions like JMP, CALL • Reduce branch delays using branch prediction and BTB
TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses <- • DMA Controller • Applications
TigerSHARC Buses • DRAM divided into 6 blocks of 4Mbits • 6 blocks connect to four 128-bit wide internal buses through a crossbar connection • Internal bus architecture provides a total memory bandwidth of 32Gbytes/sec • Core and I/O can access • twelve 32-bit data words • four 32-bit instructions per cycle
TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller <- • Applications
TigerSHARC DMA Controller • On-chip, with 14 DMA channels • Provide zero-overhead data transfers • Operates independently and invisibly to the DSP’s core
TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications <-
References • ANALOG DEVICES • http://www.analog.com/processors/processors/tigersharc/index.html • http://www.analog.com/processors/processors/sharc/index.html • http://www.analog.com/processors/resources/teachingResources.html • ECE-ADI-PROJECT HOME PAGE • http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/index.html • http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/otherschoolsFrame.htm
Summary • What is Harvard Architecture? • What is Super Harvard Architecture? • TigerSHARC processor architecture • How TigerSHARC is ‘faster’ for targeted DSP applications?
Questions? Thank You.