1 / 13

A Programmable Coprocessor Architecture for Wireless Applications

A Programmable Coprocessor Architecture for Wireless Applications. Yuan Lin, Nadav Baron, Hyunseok Lee, Scott Mahlke, Trevor Mudge Advance Computer Architecture Lab University of Michigan Sept. 2004. Introduction. Growing need to support multiple wireless protocols

weldon
Download Presentation

A Programmable Coprocessor Architecture for Wireless Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Programmable Coprocessor Architecture for Wireless Applications Yuan Lin, Nadav Baron, Hyunseok Lee, Scott Mahlke, Trevor Mudge Advance Computer Architecture Lab University of Michigan Sept. 2004

  2. Introduction • Growing need to support multiple wireless protocols • Software defined radio: implementing DSP algorithms in software rather than hardware • ASIC: high performance, low flexibility • Processor: high flexibility, low performance • Objective: achieve real time performance with processor flexibility and programmability

  3. Performance Requirements UWB 200Mbps Hiperlan2 36Mbps 802.11b 11Mbps

  4. DSP Algorithms Characteristics • Streaming data • Short variable liveness • High data throughput • High data level parallelism • Low control flow overhead • Counted loops • Low data-dependent branches

  5. Proposed Coprocessor Architecture: MAPP • Stream Data • Macro pipeline architecture • No cache structure • High Data Level Parallelism • Vector architecture • Low Control Flow Overhead • No branch predictors • Programmability to support multiple protocols

  6. MAPP Architectural Diagram ARM Core Instruction Cache VPP Controller Vector Processing Pipeline Data Cache PPU PPU PPU

  7. PPU Architectural Diagram Pipeline Processing Unit VPP Controller Vector Register File Data Out Vector ALU Out Queue Data In In Queue VPP Controller Internal Instruction Buffer

  8. Mapping DSP Algorithms: Viterbi ACS bm1 s1 s0 bm0 v0 0 4 8 8 2 8 0 4 4 8 4 8 2 4 8 2 v1 mask l l g e e g l g 0 4 8 2 0 0 4 4 8 2 4 0 2 s’ bm1 vadd v0, s0, bm0 S1 vadd v1, s1, bm1 cmp v0, v1 mux bm0 S’2 move{le} s’, v1 move{g} s’, v2 S0

  9. Increase Area/Power Efficiency • Data slice architecture • Most DSP algorithms do not need 32-bit precision • Viterbi decoding operates on 8 bits data • Filters may need 16 bit precisions • Partial processor execution • Statically determined code • Turn off architecture units not used • Energy saving, no area saving

  10. Vector Cluster Diagram (4x8 bit data slice) In Queue Register File ALU Out Q. In Queue Register File ALU Out Q. 4x4 Local Interconnect Network In Queue Register File ALU Out Q. In Queue Register File ALU Out Q.

  11. Performance Results

  12. Simplistic Power Analysis • Based on ARM9 data in 0.13u • Viterbi Decoder (K=7): 0.75W ~ 1W • 64x4 8 bit ALU: ~240mW • 12KB Mem: ~310mW • Clock: ~200mW • Others: ~250mW • ASIC implementations: 7.65mW ~ 0.7W (with different throughputs)

  13. Conclusion & Future Work • Programmable coprocessor architecture • Can support multiple protocols • Achieves real-time computational requirements • Reasonable power consumptions • Future work • Realistic power model simulation • Implement complete protocols • Algorithm behavior studies • Shrink processor area

More Related