260 likes | 536 Views
How to realize high-performance compute with Multicore DSP. C667x Target Applications (Non- Telecom). HPC, Imaging and Medical. Mission Critical. Test and Automation. Emerging Others. Video Infrastructure. Infrastructure Audio. Innovations. Emerging Broadband .
E N D
C667x Target Applications (Non- Telecom) HPC, Imaging and Medical Mission Critical Test and Automation Emerging Others Video Infrastructure Infrastructure Audio Innovations Emerging Broadband
RF and Communication Applications Govt & Public Safety Avionics Military & Defense • Application • ISR (Intelligence/Surveillance/Reconnaissance) • SIGINT/COMINT/Signal Generators • Military Communications. • SDR(JTRS)-Manpack/LMR/Fixed • Comm. Infra - VoIP/Video Gateways • Satellite\Avionics Communications • Ground Receiver/Repeaters • Weather Radar • FAA – Civil Aviation/Govt Comm. • Conventional PS – TETRA/APCO/E911 • Wireless Infrastructure • Comm. Infra - VoIP/Video Gateways • Emerging Broadband (OFDM/LTE/WiMAX) • Utilities/Transport/Smart Grid • Key Customer Careabouts • Long Term Partnership • Financial Stability • Strong Roadmap and R&D • Floating Point Performnce • Size, Weight, and Power (SWaP) • I/O Bandwidth • Longevity of supply (10+yrs) 3
RF and Comm. Product Requirements End Product NeedDSP Requirement • Needs Raw Performance in terms of MIPS/GHz/MMACS • Floating Point Capable ISA to achieve “precision” and high GFLOPS. • Large On Chip RAM • Reduce accesses to slow external memory. • High Speed External Memory Interface • Large addressable memory • Efficient DMA architecture • Wireless specific accelerators and TCP/IP Offload • Support Multiple Waveforms • Common Platform for TDMA/CDMA/OFDMA • Multi-channel VoIP/Video capability • Support FEC and Modulation • TCP/IP Networking support
Imaging Product Requirements End Product NeedDSP Requirement • High BW Interface • RF Front End and Telecom ports • Connect Multiple DSPs on a board e.g. in ATCA Card • High BW Backplane and Network Connectivity • Needs multiple high speed interfaces • PCIe ,Serial RapidIO • OBSAI/CPRI Interface • Gigabit Ethernet etc • Memory Error Correction & Checking (ECC) • Efficient Low Power DSPs • Support Extended Temp ranges from -40oC to 105oC and others Temp • Reliability in Mission Critical Designs • Low Power Design • Dev and Debug Tools • Multicore S/W Frameworks • Signal/Image Processing functions. • VoIP Library • Audio/Video Codecs • Ease of Use
Introducing “Keystone Architecture” (C66x) NEW MultiCore DSPC66x C64x+ C67xx The Best Combination of Performance (GHz) and Power Consumption in the Industry16GFLOPs & 32GMACS per Core @ 1GHz Next-Generation C66x DSP Core C64x+ Core (Fixed pt) Fixed and Floating-point Core@1.25 GHz 4x C64x+ MAC (32) 4xC67x Fl pt MAC(8) 16FLOP/cy compared to 6FLOP/cy 8 Core C6678 based on C66x core delivers 320 GMACs/160GFLOPS @ 1.25GHz/Core (effectively a 10GHz DSP) 100% Code Compatible with all C64x (fixed) & C67x (floating) Devices Similar Power Profiles as C64x Core Supported by Code Composer Studio IDE Fixed Point Lowest Power Highest Performance DSP Core Floating Point C67x Core (Floating pt) Industry’s Lowest Power FP DSP CoreHigh precision and wide dynamic range KEYSTONEArchitecture
Unmatched Performance BDTImark2000 TM Score BDTI Score for Floating Point Processors BDTI Score for Fixed Point Processors
TI Multicore KeyStone Architecture Multicore Navigator C66x, ARM Processing Cores MulticoreShared Memory Controller Shared Memory System Management(Debug, Clocking, Power) TeraNet 2 Application Accelerator Application Accelerator High Speed I/O The first network on chip infrastructure to unleash full multicore entitlement Network on Chip HyperLink 50 • Highest Integration • Cost & Power • Common Architecture • Portable Software • Scalable • Tailored Solutions • Navigator • Innovative Multi-core • Floating Point • Development Time • Tools & Debugging • R&D Efficiency • Quality Software • Solutions & Libraries 8
Product Highlights: C6670 and C6678 C6670 C6678 Performance Optimized Core Power Optimized Core C66X DSP C66X DSP C66X DSP C66X DSP C66X DSP C66X DSP C66X DSP C66X DSP C66X DSP C66X DSP C66X DSP L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 Multicore Navigator • Next Generation C66x Core • 4 C66x Cores @ 1GHz - 1.2GHz • Memory Architecture • 4MB Local L2/Core (1MB per Core) • 2MB Multicore Shared Memory • Communication Accelerators • TCP3e (Turbo Encode) – Up to 550Mbps • TCP3d (Turbo Decode) – Up to 600Mbps • FFTC – 2048 FFT every 4.6µs • VCP2 for voice channel decoding Multicore Navigator • Next Generation C66x Core • Up to 8 C66x Cores @ 1GHz -1.25GHz • Available Options: 1, 2, 4, and 8 Core Devices • Memory Architecture • 4MB Local L2/Core (512KB per Core) • 4MB Multicore Shared Memory • Power Optimized Core • <10W at 1Ghz nominal temp Communications CoProcessors 8 x CorePac Network CoProcessors C66X DSP 4x VCP2 3x TCP3d Crypto L1 L2 2x RAC Packet Accelerator 1x TAC 3x FFTC BCP TeraNet TeraNet IP Interfaces Network CoProcessors GbE Switch Memory Subsystem SGMII SGMII Crypto Multicore Shared Memory Controller (MSMC) Memory Subsystem DDR3- 64b Packet Accelerator Peripherals & IO Shared Memory 4MB Multicore Shared Memory Controller (MSMC) HyperLink DDR3- 64b Peripherals & IO SRIO x4 PCIe x2 EMIF 16 System Elements Shared Memory 2MB HyperLink Power Management SysMon TSIP x2 I2C SPI UART SRIO x4 PCIe x2 AIF2 x6 System Elements Debug EDMA Power Management SysMon SGMII x2 I2C SPI UART Debug EDMA TI Confidential – NDA Restrictions
Innovation & Integration via C6678 DSP Highlights Multicore Navigator Data transfer engine that is architected to move data between various system elements without using any CPU overhead so maximum system efficiency is achieved C66x Core Next generation Fixed / Floating-Point DSP core with clock speeds ranging from 1GHz– 1.25GHz and Up to 8core options C66X DSP C66X DSP C66X DSP C66X DSP C66X DSP C66X DSP C66X DSP L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L2 L2 L2 • Memory Architecture • 0.5 MB of local Memory per core; • 4 MB of Shared Memory. • Enhanced memory architecture through an enhanced Multicore Shared memory Controller • Bottleneck free fast on- and off-chip memory access including a DDR3-1333MHz (64-bit) interface • L1/L2/L3 ECC TeraNet Switch fabric that has 2 Terabits of bandwidth which allows maximum data transfer between system components to realize full system entitlement Network Co- Processor and Accelerators A cost effective implementation to off-load the TCP/IP and secure networking functions from the DSP HyperLink Ultra high-speed ( up to 50 Gbaud), low latency serial interface that connects to other DSPs and FPGAs in the systems Improved Debug S/W Dev and Debug Support Leveraged by CCS • Peripherals and I/O Interfaces • High bandwidth peripherals that operate independently (NOT Shared) allowing simultaneous data transfer to prevent bottle necks - featuring: • RapidIO v2.1 – 4lanes @ 5Gbps with 1x, 2x and 4x support • PCIe x2 – 2lanes, running independently of RapidIO Multicore Navigator 8 x CorePac Network CoProcessors C66X DSP Crypto L1 L2 Packet Accelerator TeraNet IP Interfaces GbE Switch Memory Subsystem SGMII SGMII Multicore Shared Memory Controller (MSMC) DDR3- 64b Peripherals & IO Shared Memory 4MB HyperLink SRIO x4 PCIe x2 EMIF 16 System Elements Power Management SysMon TSIP x2 I2C SPI UART EDMA Debug
TMDXEVM6678L EVMSinge wide AMC form factor C6678 Code Composer Studio™ IDE *Design *Code and Build *Debug *Analyze *Tune H/W Development Tools • CCSv5 Allows designers of all experience levels to move quickly through application development (www.ti.com/ccstudio) • Time Limited FREE Evaluation Versions available for download. Includes C667x Simulator • EVM Kit includes • BIOS 6.x, • BIOS-MCSDK / LINUX-MCSDK 2.0 (NDK, PDK, LIB etc), • Sample Program and Out of box demo (OOB) e.g. • I/O Benchmark, Imaging Processing Pipeline and High Performance DSP Utility Application (HUA) • User Guide, Starter guide, Tech Ref Guide, App Notes etc • TMDXEVM6678L – EVM with XDS100 emulation - $399 • TMDXEVM6678LE – EVM with XDS560V2 emulation - $599 • TMDXEVM6678LXE – EVM with XDS560V2 emulation –Encryption Enabled - $599 • TMDSEMU560v2STM-UE - XDS560v2 System Trace Emulator with 128Mb System Trace buffer and Ethernet / USB support • Optional PCIe adapter card to connect the C6678 EVM to a standard PCI header of a desktop.
TI’s Multicore Hardware Ecosystem Others Standardized Boards PCIExpress (with Gen 2) Chassis / System Advanced Mezzanine (AMC) Custom ATCA Other
TI’s Multicore Software Ecosystem Customer Application Multicore Entitlement Layer 2+ IP Network Stack Layer 1 LTE Layer 1 UMTS TI Runtime TI’s Device Entitlement Libraries TI BIOS, Linux, OSE(ck) TI Layer 1 Libraries
Multicore Tools and Software (MC-SDK) Demo App Multicore Linux Demo App Multicore BIOS Platform Development Kit BIOS • Tools • Codegen with OpenMP support • Emulator/Debugger • Simulator • Profiler / DVT • 3rd party tools • Software • BIOS/Linux SDK • Multicore Demonstration • 6.x DSP BIOS • Platform Abstraction • Basic Networking • Inter core communication • Application Specific Libraries • Audio/Video CODECS • VoIP Components • WiMAX Toolkit, LTE Toolkit, • DSPLib • others.. Eclipse DSP Customer Application Code Composer StudioTM Third Party Plug-Ins Multicore Software Development Kit Demo App Multicore BIOS and Linux Polycore Editor/IDE ENEA Optima Compiler Linker (Codegen) DSPLIBIMGLIB NDK Audio Codec Video Codec Speech Codec 3L Profiler Operating System w/ Boot Loader Linux Debugger Multicore Entitlement Remote Debug Inter Core Communication Full Silicon Entitlement SoC Analyzer Target Board Host Computer XDS 560 V2 XDS 560 Trace
KeyStone Multicore Software – Libraries & Codecs • Voice and Fax • Line Echo Cancellation • Voice Activity Detection • Others… • Available free from TI • Digital Signal Processing • FFT • Adaptive Filtering • Filtering and convolution • Others….. • Available free from TI • Image Processing • Edge Detection • Boundary • Morphology • Others….. • Available free from TI Libraries • Vision Lib (object only) • 50+ royalty-free kernels: • • Background modeling & subtraction • • Object feature extraction • • Tracking, recognition • • Low-level pixel processing • MATLAB • Image processing • Math operations • Vision Analytics • Security/Cryptography • AES, SHA1, 3DES • Audio • MPEG1 Layer2 • AAC LC/HE • AC3 2.0/5.1 • Sample Rate Conversion • Voice • G.711, G.722 • G.723, G.729 • CDMA, AMR(NB/WB), EVRC-B • Others • Video • H.263 • H.264 • MPEG2 • MPEG4 • VC1/WMV9 Decode • Others Codecs • Fax • T.38 • Fax Modem
High-Performance and Multicore Processor High Value Keystone Architecture High-Performance at the Right Power & Price Low-Cost EVM Open & Affordable Tools Easy to Use Training Product Collateral Drivers & Example Code User Community Quick to Market Enabler Software Quick-Start Hardware Benchmarks & Functional Understanding Frameworks & Abstraction Generic Libraries Application Libraries
Getting Started – More Information/Links • Product Folders: • C66X Informational Wiki Page • All C6000 Multicore DSPs • TMS320C6670 • TMS320C6678 • EVMs and Software Tools: • TMS320C6678 EVM • TMS320C6670 EVM • AMC to PCIe Adapter Card • Multicore Software Development Kit for BIOS & Linux • MCSDK Wiki • CCS v5 Wiki • C66x Linux Wiki • DSP Signal Processing Library(DSPLIB) • Image and Video Processing Library (IMGLIB) • LTE /WiMAX Toolkit – Discuss with BDM • Technical Support • TI E2E Community (Online Support) • Product Training TI Confidential – NDA Restrictions
Online Video Traininghttp://focus.ti.com/docs/training/catalog/events/event.jhtml?sku=OLT110027
Mission Critical DSP Market“What Customers Like about TI” Undisputed #1 DSP and SoC supplier Strong Growth for 8 years in a row, even in 2009 Higher R&D spending than DSP revenue of most competitors KeyStone SoC Architecture secures future success Rich Product Portfolio & Strong Roadmap 2 Families with multiple devices and growing Nyquist(6670), Shannon(6678/4/2) 40nm -> 28nm Tools/Software & Compilers 3rd Party Eco-System Multiple Design Wins Pre-Announcement Secure Supply – No DSP product discontinuation (end of life) History of delivery upon promises (Power, GHz, ..) Field Experience - Completeness of system analysis, Architecture, Internal Switch, …. Customer Support Business Model - Long Term relationships with key customers – Actively seek and incorporate customer feedback in roadmap devices. TI SoCArchitecture Revenue Layer 3+ Layer 3, 4 2002 2009 MAC Layer 2 Macro Pico Femto Layer 1 PHY Software Radio IP Network
C6678 (Shannon) “Lightning” Half-Length PCIe Card Feature Set • TI TMS320C6678 (8-core) x 4 • C66x Core Frequency: 1.25GHz • DDR3 Memory • Data Frequency: 1600MHz • Data Bus Width: 64-bit • Serial RapidIO Gen-2 Interface • PCIe Gen-2 Interface • 10/100/1000Mbps Ethernet w/ SGMII • Hyperlink50 Interface • 1024 MB DDR3-1333 on board • PLX PEX8624 PCIe Gen-2 Switch • Serial RapidIO daisy-chain • Ethernet daisy-chain • Each DSP device is linked to PCIe switch by x2 lanes • Dual DSPs linked by Hyperlink50 • Power: Max 54Watts
What is Hyperlink?“high-speed, low-latency, and low-pin-count communication interface” • Low pin count (24 pins) • Point to Point Connection • Interconnect • DSP-to-DSP • DSP-to-FPGA. • SerDes for data transfer • x1 x4 modes for Tx and Rx • 12.5GBaud/lane • Effectively 8b9b encoding • LVCMOS sideband signals for flow control & power mgmt - errors/events/timeouts * Simple packet-based transfer protocol for memory-mapped access * Read/Write to DSP/FPGA local memory - discrete memory access of any byte aligned width up to 64bits. - burst transfer modes • Write (Maximum Burst Size 256Bytes) • Write Request ---> • Data Packet ---> • Read (Maximum Burst Size 256Bytes) • Read Request ---> • Read Response - • Interrupt Request <--> Up to 64 Memory mapped Regionseach region up to 256MB
Universal Parallel Port (uPP) • What is it? • Parallel bus, two independent channels (separate data buses) • I/O speeds up to 75 MHz with 8-16 bit data width per channel • 1 or 2 channel parallel interface operating in RX, TX or FD mode • Supports Double data rate mode of operation (Bandwidth does not change/increase) • Application • Each channel can interface cleanly with high-speed ADCs and/or DACs with up to 16-bit data width (per channel). • Useful as low cost interface with FPGAs. Can run up to 120MByte/s per channel in single channel or bi-directional mode ( 240MByte for both channels in unidirectional mode) • Can also be used to interface two C6655/57 devices or to connect C6655/57 with C674x or OMAP-L13x family of devices. • Other benefits • Internal DMA – leaves CPU EDMA free • Simple protocol with few control pins (configurable: 2-4 per channel) • Multiple data packing formats for 9-15 bit data widths • Interleave mode (single channel only) • Simple interface: IO Queued by software Throughput Estimates: Note: Max. clock of 50 MHz in (*) configuration