1 / 39

Wireless Communication Extensions for DSPs and General Purpose Processors

Wireless Communication Extensions for DSPs and General Purpose Processors. Sridhar Rajagopal COMP 625 April 17, 2000. Motivation . Wireless, the next wave after Multimedia Highly Compute-Intensive Algorithms Real-Time Requirements Design based on Time-to-Market. Outline .

Download Presentation

Wireless Communication Extensions for DSPs and General Purpose Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wireless Communication Extensions for DSPs and General Purpose Processors Sridhar Rajagopal COMP 625 April 17, 2000

  2. Motivation • Wireless, the next wave after Multimedia • Highly Compute-Intensive Algorithms • Real-Time Requirements • Design based on Time-to-Market Sridhar Rajagopal

  3. Outline • Processor Core with Reconfigurable Support • Permutation Based Interleaved Memory • Processor Architecture -EPIC • Instruction Set Extensions • Truncated Multipliers • Software Support Needed Sridhar Rajagopal

  4. Characteristics of Wireless Algorithms • Massive Parallelism • Bit-level Computations • Matrix Based Operations • Memory Intensive • Complex-valued Data • Approximate Computations Sridhar Rajagopal

  5. What’s wrong with Current Architectures for these applications? Sridhar Rajagopal

  6. Problems with Current Architectures • UltraSPARC, C6x, MMX, IA-64 • Not enough MIPs/FLOPs • Unable to fully exploit parallelism • Bit Level Computations • Memory Bottlenecks • Specialized Instructions for Wireless Communications Sridhar Rajagopal

  7. Home Area Wireless LAN Outdoor CDMA Cellular Network High Speed Office Wireless LAN Why Reconfigurable • Adapt algorithms to environment • Seamless and Continuous Data Processing during Handoffs Sridhar Rajagopal

  8. User Interface Translation Synchronization Transport Network OSI Layers 3-7 Data Link Layer (Converts Frames to Bits) OSI Layer 2 Physical Layer (hardware; raw bit stream) OSI Layer 1 Reconfigurable Support Sridhar Rajagopal

  9. Different Protocols • MPEG-4, H.723 - Voice,Multimedia • Convolutional,Turbo - Channel Coding Source Coding Channel Coding Source Decoding Channel Decoding Multiuser Detection Channel Estimation Sridhar Rajagopal

  10. A New Architecture Main Memory Processor Core (GPP/DSP) Cache Q Q Crossbar Real-Time I/O Bit Stream Reconfigurable Logic RF Unit Add-on PCMCIA Network Interface Card Processor Sridhar Rajagopal

  11. Why Reconfigurable • Process initial bit level computations • Optimize for fast I/O transfer Real-Time I/O Bit Stream Reconfigurable Logic RF Unit Sridhar Rajagopal

  12. Reconfigurable Support 2 64-bit data buses 1 64-bit address bus Control Blocks Boolean values Fast I/O Configuration Caches 64-bit Datapath Sequencer GARP Architecture at UC,Berkeley Sridhar Rajagopal

  13. Reconfigurable Support • Wide Path to Memory • Data Transfer • Minimize Load Times • Configuration Caches • Recently Displaced Configurations(5 cycles) • Can hold 4 full size Configurations • Independent Execution Sridhar Rajagopal

  14. Reconfigurable Support • Access to same Memory System as Processor • Minimize overhead • When idle • Load Configurations • Transfer Data Sridhar Rajagopal

  15. Operation • Load Configuration • If in configuration cache, minimal time • Copy initial data with coprocessor move instructions • Start execution • Issue wait that interlocks while active • Copy registers back at kernel completion Sridhar Rajagopal

  16. Instruction Cache Processor Core (GPP/DSP) L1 Data Cache Main Memory Q Q Crossbar FPGA Memory Interface • Access to Main Memory and L1 Data Cache • Large, fast Memory Store • Memory Prefetch Queues for Sequential Accesses • Read aheads and Write Behinds Sridhar Rajagopal

  17. Permutation Based Interleaved Memory (PBI) • High Memory Bandwidth Needed • Stride-Insensitive Memory System for Matrices • Multiple Banks • Sustained Peak Throughput (95%) Main Memory L1 Data Cache Sridhar Rajagopal

  18. PBI Scheme • N- address length • M = 2n Banks • 2N-n words in each bank • To access a word, • n-bit bank number • N-n bit address (high-order) • Calculation of the n-bit Bank Number Sridhar Rajagopal

  19. N-bit address Parity Ckt. Row 1 of A Parity Ckt. Row n-1 of A Parity Ckt. Row 0 of A n parity bit signals Decoder 2n bank select signals Calculate Bank Number • Use all N bits to get n-bit vector • Y = A X , A = n*N matrix of 0’s & 1’s • Y = AhXh + Al Xl (N-n,n) [Al -rank n] • N-bit parity circuit with logkN levels of XOR gates (k-Fanin) Sridhar Rajagopal

  20. Interleaved Memory Model Input Buffers Address Source Memory Banks M(0) M(1) M(M-1) Data Sink Data Sequencer Output Buffers Sridhar Rajagopal

  21. Processor Core (GPP/DSP) Cache Q Q Crossbar FPGA Processor Core • 64-bit EPIC Architecture with Extensions(IA-64/C6x) • Statically determined Parallelism;exploit ILP • Execution Time Predictability Sridhar Rajagopal

  22. EPIC Principle • Explicitly Parallel Instruction Computing • Evolution of VLIW Computing • Compiler- Key role • Architecture to assist Compiler • Better cope with dynamic factors • which limited VLIW Parallelism Sridhar Rajagopal

  23. Aspects of EPIC • Designing Plan of Execution(POE) at Compile Time • Permitting Compiler to play Statistics • Conditional Branches, Memory references • Communicating POE to the hardware • Static Scheduling • Branch information Sridhar Rajagopal

  24. Architecture Features in EPIC • Static Scheduling • MultiOP • Non-Unit Assumed Latency (NUAL) • The Branch Problem • Predicated Execution • Control Speculation • Predicated Code Motion • The Memory Problem • Cache Specifiers • Data Speculation Sridhar Rajagopal

  25. Instruction Set Extensions • To accelerate Bit level computations in Wireless • Real/Complex Integer - Bit Multiplications • Used in Multiuser Detection, Decoding • Bit - Bit Multiplications • Used in Outer Product Updates • Correlation, Channel Estimation • Complex Integer-Integer Multiplications • Useful in other Signal Processing applications • Speech, Video,,, Sridhar Rajagopal

  26. Architecture Support • Support via Instruction Set Extensions • Minimal ALU Modifications necessary • Transparent to Register Files/Memory • Additional 8-bit Special Purpose Registers Sridhar Rajagopal

  27. Integer - Bit Multiplications D[I] = D[I] + b[J]*C[j] Eg: Cross-Correlation 64-bit Register C 64-bit Register A +/- +/- +/- 8-bit Register b 64-bit Register D Register Renaming? Sridhar Rajagopal

  28. b(1) b(2) b(7) b(8) 8-bit to 64-bit conversions 1.2 1.1 D = D + b*bT Eg: Auto-Correlation 2.1 b1 = b(1:8),b(1:8),….b(1:8) b2 = b(1)b(1)……b(8)b(8) 8-bit Register b 64-bit Register A b(1)..b(8) b(1)..b(8) b(1) b(1) b(8) b(8) Sridhar Rajagopal

  29. Bit-Bit Multiplications D = D + b*bT Eg: Auto-Correlation b1*b2 Bit-Bit Multiplications 64-bit Register A = b1 64-bit Register B=b2 Ex-NOR 64-bit Register C=b1*b2 Sridhar Rajagopal

  30. Increment/Decrement D = D + b*bT Eg: Auto-Correlation 64-bit Register D 1 +/- +/- +/- 8-bit Register b1*b2 64-bit Register (D+b1*b2) Sridhar Rajagopal

  31. Complex-valued Data Processing • Is it easy to add ? • Is this worth an additional ALU Support ? • Typically supported by Software! ? Sridhar Rajagopal

  32. Truncated Multipliers • Many applications need approximate computations • Adaptive Algorithms :Y = Y + mu*(Y*C) • Truncate lower bits • Truncated Multipliers - half the area/half the delay • Can do 2 truncated multiplies in parallel with regular ALU Multipliers Truncated Multiplier Multiplier 1 Multiplier 2 Sridhar Rajagopal

  33. Software Support • Greater Interaction between Compilers and Architectures • EPIC • Reconfigurable Logic • Compiler needs to find and exploit bit level computations • Reconfigurable Logic Programming Sridhar Rajagopal

  34. Area Estimates • Area increase by 20% over a IA-64 architecture size due to reconfigurable Support • Instruction Set extensions need min hardware support • Parallel Interleaved Memory Banks will need larger area Sridhar Rajagopal

  35. Other Uses • Reconfigurable Logic • For accelerating loops of general purpose processors • Bit Level Support • For other voice, video and multimedia applications Sridhar Rajagopal

  36. Conclusions • Processor Core with Reconfigurable Support developed for Wireless Applications • Instruction Set Extensions added for accelerating performance of the algorithms • Integration of Wireless Appliances with General Purpose Processors • Great Impact on Performance of Wireless Algorithms Sridhar Rajagopal

  37. Future Work • Simulations for finding performance improvements • Other Processor Architectures • Bit Slice Architectures • Out-of-order Sridhar Rajagopal

  38. References • The GARP Architecture and C Compiler • T.C. Callahan,J.R.Hauser,J.Wawrzynek, IEEE Computer,April 2000, pp62-69 • http://brass.cs.berkeley.edu • EPIC:Explicitly Parallel Instruction Computing • M.S.Schlansker,B.R.Rau, IEEE Computer, Feb 2000, pp 37-45 • High-Bandwidth Interleaved Memories for Vector Processors - A Simulation Study • G.S.Sohi, IEEE Transactions on Computers, Vol.42,No.1,Jan 1993,pp34-44 Sridhar Rajagopal

  39. Acknowledgements • Vijay Pai • Partha Ranganathan • Joseph Cavallaro Sridhar Rajagopal

More Related