SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture

VARD ER SUP HAR Arc hit ect ure SHARC‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni

Overview • Harvard Architecture • Super Harvard Architecture • TigerSHARC processor

Outline • Background • Harvard Architecture • Why? • What? • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor

Outline • Background <- • Harvard Architecture • Why? • What? • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor

Background • von Neumann Architecture • Single storage for instructions and data • Digital Signal Processors • Specialized microprocessor designed specifically for digital signal processing, generally in real time

Outline • Background • Harvard Architecture • Why? <- • What? • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor

Why Harvard Architecture ? • von Neumann bottleneck (‘memory bound’) • DSP applications • In von Neumann architecture • Either reading an instruction • Or reading/writing from/to memory

Harvard Architecture (cont…)

Outline • Background • Harvard Architecture • Why? • What? <- • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor

What is Harvard Architecture ? • Physically separate storage and signal pathways for instruction and data • Next instruction fetched, when executing current instruction • Program memory can be small and wide • Data memory can be large and narrower

Outline • Background • Harvard Architecture • Why? • What? • Modern CPU Chip Design <- • Super Harvard Architecture • TigerSHARC Processor

Modern CPU chip design • Incorporate features from both architectures • ‘On chip’ cache memory – divided into instruction cache and data cache. Harvard architecture used when CPU accesses cache memory. • On a cache miss, ‘off chip’ main memory is accessed using von Neumann architecture. Main memory is not separated into data and instruction sections.

Outline • Background • Harvard Architecture • Why? • What? • Modern CPU Chip Design • Super Harvard Architecture <- • TigerSHARC Processor

Super Harvard Architecture • Cache used to store instructions, leaving both instruction bus and data bus free to fetch operands • Harvard Architecture + cache = Extended Harvard Architecture or Super Harvard Architecture

Outline • Background • Harvard Architecture • Why? • What? • Modern CPU Chip Design • Super Harvard Architecture • TigerSHARC Processor <-

TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications

TigerSHARC Processor • Processor Architecture <- • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications

TigerSHARC Processor Architecture • 3 128-bit data buses • 2 IALU’s • 2 Computational Blocks • ALU ( Float and Integer ) • SHIFTER • MULTIPLIER • CLU

TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation <- • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications

TigerSHARCInstruction Parallelism and SIMD Operation • Core can execute simultaneously one to four 32-bit instructions encoded in single instruction line (VLIW). • Can execute in parallel? Depends on…. • Instruction line resources each requires • Source and Destination of registers used • Supports SIMD operations through the use of both Computational Blocks in parallel. • Each Computational Block can execute four 16-bit or eith 8-bit SIMD computations in parallel.

TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU <- • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications

TigerSHARCInteger ALU • 31 32 bit general registers + 1 status register + 8 dedicated registers for circular buffers • Performs integer ALU operations and data addressing • ALU instructions: ADD, SUB, ARS, LRS (right shifts only), ROT (left and right), AND NOT, NOT, OR, XOR, ABS, MIN, MAX, CMP • Status flags: zero (Z), negative (N), overflow (V), carry (C) • Instruction conditions: EQ, LT, LE, NEQ, NLT, NLE • Instruction options: unsigned (U), circular buffer (CB), bit reverse (BR), computed jump (CJMP) • Address related operations: data address generation, circular buffers, bit reverse, UREG moves, DAB control.

TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File <- • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K Buses • DMA Controller • Applications

TigerSHARC Computational BlocksX and Y Register File • Register File Syntax • Each Block has 32x32 bit Data registers • Each register can store 4x8 bit, 2x16 bit or 1x32 bit words. • Registers can be combined into dual or quad groups. These groups can store 8, 16, 32, 40 or 64 bit words.

Register File Syntax TigerSHARC Computational BlocksX and Y Register File

Volatile registers in each block • 24 Volatile Data registers in each block • XR0 – XR23 • YR0 – YR23 • 2 ALU summation registers in each block • XPR0, XPR1, YPR0, YPR1 • 5 MAC accumulate registers in each block • XMR0 – XMR3, YMR0 – YMR3 • XMR4, YMR4 – Overflow registers

TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU <- • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications

TigerSHARC X and Y ALU • 2x64 bit input paths • 2x64 bit output paths • 8, 16, 32, or 64 bit addition/subtraction - Fixed-point • 32 or 64 bit logical operations - fixed-point • 32 or 40 bit floating-point operations

Sample ALU Instruction • Example of 16 bit addition • XYSR1:0 = R31:30 + R25:24 • Performs addition in X and Y Compute Blocks

TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier <- • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications

TigerSHARC Multiplier • Operates on fixed, floating and complex numbers. • Fixed-Point numbers • 32x32 bit with 32 or 64 bit results • 4 (16x16 bit) with 4x16 or 4x32 bit results • Floating-Point numbers • 32x32 bit with 32 bit result • 40x40 bit with 40 bit result • Complex Numbers • 32x32 bit with 32 bit result • Fixed-point only • Results stored in MR register

TigerSHARC Multiplier XR0 = R1*R2;; XR1:0 = R3*R5;; XMR1:0 = R3*R5;; //uses XMR4 overflow XR2 = MR3:2, XMR3:2 = R3*R5;; XR3:2 = MR1:0, XMR1:0 = R3*R5;; XFR0 = R1*R2;; XFR1:0 = R3:2*R5:4;; //40 bit multiply //32 bit mantissa

TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter <- • CLU • Program Sequencer • I J and K data buses • DMA Controller • Applications

TigerSHARCShifter • Operates on one 64-bit, one or two 32-bit, two or four 16-bit, and four or eight 8-bit fixed-point operands • Shifts and rotates bits • manipulation operations, like bit set, clear, toggle and test • Bit FIFO operations to support bit streams

TigerSHARC Processor • Processor Architecture • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU <- • Program Sequencer • J and K data buses • I bus – data bus

TigerSHARC CLU • CLU instructions are designed to support different algorithms used for communications applications • Algorithms supported are • Viterbi Decoding (minimal distance decoding algorithm) • Turbo-code Decoding (variant of Viterbi decoding) • De-spreading for Code Division Multiple Access (CDMA) systems (used for tasking a signal in wide Pseudo Noise spread bandwidth)

TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer <- • I J and K buses • DMA Controller • Applications

TigerSHARC Program Sequencer • Supplies instruction addresses to memory • IAB caches up to five fetched instruction lines waiting to execute • It extracts an instruction line from IAB and distributes to appropriate core component for execution • Determine flow control for instructions like JMP, CALL • Reduce branch delays using branch prediction and BTB

TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses <- • DMA Controller • Applications

TigerSHARC architecture at a glance

TigerSHARC Buses • DRAM divided into 6 blocks of 4Mbits • 6 blocks connect to four 128-bit wide internal buses through a crossbar connection • Internal bus architecture provides a total memory bandwidth of 32Gbytes/sec • Core and I/O can access • twelve 32-bit data words • four 32-bit instructions per cycle

TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller <- • Applications

TigerSHARC DMA Controller • On-chip, with 14 DMA channels • Provide zero-overhead data transfers • Operates independently and invisibly to the DSP’s core

TigerSHARC Processor • Processor Architecture • Instruction Parallelism and SIMD Operation • Integer ALU • Computational blocks • X and Y Register File • X and Y ALU • Multiplier • Shifter • CLU • Program Sequencer • I J and K buses • DMA Controller • Applications <-

TigerSHARC Applications

References • ANALOG DEVICES • http://www.analog.com/processors/processors/tigersharc/index.html • http://www.analog.com/processors/processors/sharc/index.html • http://www.analog.com/processors/resources/teachingResources.html • ECE-ADI-PROJECT HOME PAGE • http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/index.html • http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/otherschoolsFrame.htm

Summary • What is Harvard Architecture? • What is Super Harvard Architecture? • TigerSHARC processor architecture • How TigerSHARC is ‘faster’ for targeted DSP applications?

Questions? Thank You.

SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture

SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture

Presentation Transcript

Emerging Techniques and Technologies for Treatment of Solid Tumors

EMERGENCIES IN CHILDHOOD

Today's Agenda

In Brooklyn We Work Hard, We Work Hard……….... in the Classroom

Solar Neutrinos in S uper- K amiokande ICHEP2014 @Valencia July 4 2014

Is Y our Environment S uper-Sizing you ??

CSE P501 – Compiler Construction

International Wastewater Heat Exchange Systems Inc.

New York Giants

Intel Pentium 4

Chapter Six

SHARC instruction set

ENEL619.23 DSP Architectures

ADSP – 21060 SHARC Digital Signal Processor

WELL-BALANCED ULTRA HIGH INDEX 1.70

SWAN SAT * and the Worldwide Wireless Telecom Market