200 likes | 336 Views
DSPs for future wireless systems. Sridhar Rajagopal. Motivation. Baseband. Programmable. A/D. Wireless Mobile. RF Unit. D/A. device. Communications. Processor. Higher Layers. Add-on PCMCIA Network Interface Card. Mobile: Switch between standards and between parameters
E N D
DSPs for future wireless systems Sridhar Rajagopal
Motivation Baseband Programmable A/D Wireless Mobile RF Unit D/A device Communications Processor Higher Layers Add-on PCMCIA Network Interface Card • Mobile: Switch between standards and between parameters • Base-station: varying number of users with different parameters
GPP DSP Performance Power Flexibility FPGA VLSI The problem
An approach for the solution • Algorithms well understood at VLSI level • Can design real-time systems. • Pushing it higher in the chain • Current DSPs not powerful enough for our application • Using the IMAGINE simulator to see what kind of architecture features would be useful in a future DSP for such applications.
History of my work Multiuser channel estimation Multiuser detection Distant Past Algorithms VLSI Task-partitioning Parallelism Pipelining FPGA Recent Past Conventional arithmetic On-line arithmetic DSP Instruction set extensions Co-processor support Functional unit design and usage Recent and Near Future IMAGINE
Contents • Programmable architecture design using the IMAGINE simulator • Multiuser estimation and detection implementation • Performance comparisons and results • Other extensions for possible integration • Conclusions
SDRAM SDRAM SDRAM SDRAM Streaming Memory System Stream Controller Network Host Stream Register File Network Interface Processor Microcontroller ALU Cluster 7 ALU Cluster 0 ALU Cluster 1 ALU Cluster 2 ALU Cluster 3 ALU Cluster 4 ALU Cluster 5 ALU Cluster 6 Imagine Stream Processor The IMAGINE architecture and simulator • IMAGINE is a media signal processor
Why the IMAGINE simulator? • Great for media processing algorithms • Has a VLIW-based cluster -- DSP comparisons • A good base architecture : 1024-pt FFT • RSIM, SimpleScalar…: more general purpose architecture simulators
What does the simulator give us? • Execution time for the different parts of the code • Functional unit utilization • Insights into the bottlenecks • Flexibility to add and remove functional units already present or design your own • Graphical view of the schedule on the functional units
Down-side • 2 level C++ programming • StreamC: • transfers streams of data between main memory and stream register file (SRF) • KernelC: • transfers streams from the SRF to the ALU clusters • Code optimized to the number of ALU clusters and the size of the data • Compiler may fail register allocation if too many variables or functional units modified
Contents • Programmable architecture design using the IMAGINE simulator • Multiuser estimation and detection implementation • Performance comparisons and results • Other extensions for possible integration • Conclusions
Typical workload representation (Base-station) • Equalization • FFT • Viterbi decoding • Channel estimation • Multiuser detection • Viterbi/Turbo decoding • Multiple antennas • Long spreading codes • Space-Time codes Wireless LAN W-CDMA If you felt that life was too easy
Estimation/Detection (64,32 sizes) Multiuser Estimation Kernel 1,2,3 Massaging matrices for detection Kernel 4, 5 Multiuser Detection Kernel 6, 7
Kernels • 1. Update: Update Rbb, Rbr • 2. Mmult : multiply Rbb * A • 3. Iterate: gradient descent • 4. MmultL: Calculate L • 5. MmultC: Calculate C • 6. Mf: Matched Filter • 7. Pic: 1 Parallel Interference Cancellation Stage
Kernel 2 (mmult) for 3 +,2*Divider not being utilizedAdders have limited FU utilizationO(N3) *, O(N3) +Multipliers 100% in loopReplace / with *
Kernel 2 (mmult)for 3 +,3*better adder utilization needs sufficient registers for scaling [register allocation may fail]code may also need slight tuning of variables for optimization
Contents • Programmable architecture design using the IMAGINE simulator • Multiuser estimation and detection implementation • Performance comparisons and results • Other extensions for possible integration • Conclusions
FU utilization on each cluster Time for detection at 128 Kbps for each of 32 users at 500 MHz : 4000 cycles
Comparisons with DSPs -2 10 -3 10 -4 10 Execution time (in seconds) X -5 10 Single DSP implementation 2 DSP implementation Target data rate - 128 Kbps/user x Our architecture based on Imagine -6 10 0 5 10 15 20 25 30 35 Users
Current work • Evaluating performance of wireless communication algorithms such as estimation, detection and decoding on this architecture • Studying bottlenecks, functional unit design needed to attain real-time • The insights gained from the design can also be applied to other processors such as DSPs.