1 / 18

VLIW Digital Signal Processor

VLIW Digital Signal Processor. Michael Chang . Alison Chen . Candace Hobson . Bill Hodges. Introduction. Functionality ISA Implementation Functional blocks Circuit analysis Testing Off Chip Memory Status. Things to look for. Design Tradeoffs Register file size Multiple word sizes

brigid
Download Presentation

VLIW Digital Signal Processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VLIW Digital Signal Processor Michael Chang . Alison Chen . Candace Hobson . Bill Hodges

  2. Introduction • Functionality • ISA • Implementation • Functional blocks • Circuit analysis • Testing • Off Chip Memory • Status

  3. Things to look for • Design Tradeoffs • Register file size • Multiple word sizes • Instruction set and implementation • Data forwarding • Software-controlled on chip cache • Shared Address/Data bus for off-chip data memory

  4. Instruction Set Architecture • 24-bit instruction words pack 3 sub-instructions: • Ex: SUB R5 R3, LDM R3 R6, BNEZ R1 • Register file - 8 registers • 3 bit encoding * 5 Reg. IDs = 15 bits per IW • Simple but useful Instruction Set • Multiply, Add/Subtract, Branch, Jump, Load Memory, Load intermediate, Load CCM • 2 Branch delay slots

  5. Microarchitecture • In order, 4 stage pipeline • IF, ID, EX, WB • 3 cycle pipeline stage • Data forwarding • Eliminate RAW hazards (ELEC 320, 425) • 5 forwarding paths • Control Logic • PLA controls pipeline • Initialize pipeline, reset Program Counter • Cycle through three cycles of pipeline stage

  6. Implementation Double Wide Silicon Floorplan

  7. Array Multiplier Ripple-Carry Adder Longest Paths: Add/Subtract: 10.74 ns through MSB Multiply: 15.87 ns through 10th product term ALU Design

  8. Compiler Controlled Memory (CCM) • Small on chip software controlled cache • Similar to Commercial DSPs • Predictable access time in real time • Benefits over off chip memory: • Double bandwidth • Software configurability • Reduced register “spill”/ “fill” pressure • Easily extendable

  9. Implementation of CCM • 4 12-bit lines of memory on chip (8 words) • Two registers, R6 and R7, for loading and storing • Two instructions, LDC and STC • 9-bit instruction • Three bit opcode • Five bit word line • Single bit determines single/double access • Example instruction: LDC 1 00001 (Reads CCM Line 1 into R6 and R7)

  10. ORCA Test Vector Generation Process • Goal: Greater accuracy and shorter time to verify chip functionality Assembly Code assembler Vector translator Binary Code IRSIM vectors

  11. ORCA Vector Suite • Goal: Create functional vectors to isolate specific chip cells to aid in post-silicon debug. • Register File • Compiler Controlled Memory • ALU • Branch • Data Forwarding • Pipeline

  12. ORCA Obsbus State Machine • Goal: Increase internal test signals to the IO’s by implementing a MUX. The MUX is controlled by output signals generated from a state machine. • 16:1 MUX, 6 output observability pins, 1 input observability pin • Allows observation of up to 96 internal signals using 7 pins. • The state machine changes state on each toggle of the input pin. • < IRSIM Obsbus PLA OUTPUT HERE>

  13. Obsbus Signals • Goal: Track an instruction execution through each of the pipeline stages. Fetch Decode Execute Write Back RegF output Forwarding signals OpCodes ALU input/output CCM input/output OpCodes RegF inputs Program Counter Branch Address OpCodes

  14. Off Chip Memory • Instruction memory • Regular static RAM (used previously in 422) • 8 bit addressing, 8 bit data reads • 28 = 256 words possible = 85 VLIW instructions • 70 ns read time • One read every cycle • Output address on clock A, latch data on clock B • One read/cycle * 8bits * 3 cycles/pipeline state = 24 bit VLIW

  15. Off Chip Memory, continued • Data memory (DS1609) • Shared Address/Data bus • PLA carefully designed to control memory • Uses worst case propagation delays • Timed signals using two out of phase clocks • Default PLA output latching on clock B • External latching on clock A to properly time signals • 50 ns read time

  16. Current Status • Functionality of major blocks tested • Instruction Fetch in final stages • ALU instructions implemented and working, including data forwarding • Memory instructions just need to be routed • Crystal and HSPICE analysis fifty percent complete • Global power, clock, and pin routing allocated in floorplan

  17. Conclusion • Solid fundamental ISA gives a nice “baby DSP” • Modular implementation of fundamental blocks • Design Decisions are well justified • Register file size • Instruction word length • Implementation balances timing and space • Access to off chip memory

  18. Questions?

More Related