1 / 18

TI C6701 VLIW MIMD

TI C6701 VLIW MIMD. Presentation Outline. Introduction / Overview Differentiating Features Assembly Syntax Instruction Flow Pipelining and Optimization Conclusion. Introduction. TI’s C6000 family VLIW architectures Flexibility from Software. Architecture. VLIW. FPU. Yes.

gloria
Download Presentation

TI C6701 VLIW MIMD

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TI C6701 VLIW MIMD

  2. Presentation Outline • Introduction / Overview • Differentiating Features • Assembly Syntax • Instruction Flow • Pipelining and Optimization • Conclusion

  3. Introduction • TI’s C6000 family • VLIW architectures • Flexibility from Software

  4. Architecture VLIW FPU Yes MFLOPs (Peak) 1000 16x16 MACs (MMAC/s) 334 8x8 MACs (MMAC/s) 334 MIPS (Peak) 1336 MOPS (Peak) 336 Memory Bus Bandwidth (MB/s) 332 1K FP cfft (µsec) 108 1K 16 bit cfft (µsec) 108 1K FP dot product (µsec) 3.07 1K 16 bit dot product (µsec) 3.07 512 2 xFP Conv3x3 (msec) 7.11 512 2 x8 bit Conv3x3 (msec) 7.11 512 2 x8 bit Erosion/Dilation (msec) 3.62 Characteristics Chart Figure 1: TI Data Sheet

  5. Basic Overview • Eight 32-bit instructions fetched per clock cycle, called a fetch packet • Two CPU multipliers , Six ALUs for execution. Two general-purpose register files (A and B), • Eight functional units (.L1, .L2, .S1, .S2, .M1, .M2, .D1, and .D2), • Two load-from-memory data paths per register file (LD1a, LD1b, LD2a, LD2b), • Two data address paths (DA1 and DA2), and • Two register file data cross paths (1X and 2X)

  6. Architecture Overview

  7. Differentiating Features • The features that differentiate the TI from other VLIW architectures are: • Instructions that can be of varied length • Predication in all instructions • Pipelining of the branch functions

  8. Assembly Syntax • Label • Parallel Bars • Conditions • Instruction • Functional Unit • Operands • Comments

  9. Assembly Example

  10. Instruction Flow • Eight functional units - two separate groups of four • Each group has a separate data path and splits the general-purpose registers the two units are named .L1 and .L2, .M1 and .M2, .S1 and .S2, and .D1 and .D2 • The .L units are responsible for • Logical operations • Data packing and unpacking • Some arithmetic.

  11. Instruction Flow • 32 General Purpose Registers • 64 Bit Operations using the LDDW instruction • LD1a manages the least-significant 32 bits and LD1b handles the most-significant 32 bits • The .D units are joined so that we can look at either register file for data, regardless of where the data address came from

  12. Instruction Flow • Fetch Packets occur at boundaries of 256-bit intervals • Important! An execute packet can’t cross the fetch packet boundary • The execute packet for parallel instructions is created by looking at the first bit in the instruction (The P bit) • Maximum of eight instructions executed in parallel.

  13. Architecture Overview

  14. Pipelining & Optimization • The C6701 doesn’t have the ability to look ahead and schedule • The number of instructions in the execute packet is the key to optimizing the code • The number of clock cycles used in executing an instruction is called the number of delay slots • Multiple cycle instructions will have significant effects on the delay slot count of an instruction

  15. Pipelining & Optimization • Possible to have an execute packet that contains NOPS. • By using multiple NOPS in parallel with a multi-cycle instruction we will make the next execute packet capable of using the previous multi-cycle instruction result • If we use a cross-path during a multi-cycle instruction then we can’t use that cross path again until the instruction has finished

  16. Execution Pipeline

  17. AD vs. TI vs. Motorola

  18. Conclusion • The C6701 allows scheduling of instructions in the assembly code • Unfortunately, a good understanding of the hardware is still necessary to be able to schedule instructions in an optimized way • Thank You

More Related