1 / 23

LOAD BALANCING SWITCH

Project POSTER. By: Maxim Fudim Oleg Schtofenmaher Supervisor: Walter Isaschar. Winter - Spring 2008. LOAD BALANCING SWITCH. Abstract. Software solutions for real-time are too slow Power dissipation limits work frequencies

hamal
Download Presentation

LOAD BALANCING SWITCH

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project POSTER By: Maxim Fudim Oleg Schtofenmaher Supervisor: Walter Isaschar • Winter - Spring 2008 LOAD BALANCING SWITCH

  2. Abstract • Software solutions for real-time are too slow • Power dissipation limits work frequencies • Greater computing power needed • H/W accelerators may improve S/W processes • Multi-core, multi-threaded systems are the future

  3. Project Goals • Multiprocessor environment for parallel processing of vectors data stream • Maximal Throughput • Configurable hardware • Expandable design • Statistics report

  4. System specifications • SW over transparent HW • Interface over PCI • 1 Mbit/sec input stream • Vectors of 8 ÷ 1024 chunks • Variable number of processors • System spreads over multiple FPGAs

  5. Problem • How to manage Data stream? • How to manage multiple parallel units? • How to achieve full and effective utilization of resources?

  6. Solution (Top Level) • Board Level Load Balancing Switch • One system input and output to PCI • Distribute vectors among classes • Local buffers for chip data

  7. Solution (Chip Level) • Chip Level Load Balancing Switch • Converting shared resources to “personal” work space. • Cluster ‘s organized VPUs • Monitoring for each unit’s load • Smart arbitration • Flexible and easy configuration

  8. Solution - Tree Distribution Switch SW/HW interface Class of Service Distribution LBS Arbitration LBS Arbitration LBS Arbitration Clusters of VPUs Clusters of VPUs Clusters of VPUs Clusters of VPUs Clusters of VPUs Clusters of VPUs Clusters of VPUs Clusters of VPUs Clusters of VPUs

  9. Three level Architecture • Provide level for packets management ( Classes ) • Type, Size, Priority of Data • Provide level for organizing various processing units ( Clusters ) • Speed , Quantity, Resources of Processors • Provide level for fine tuning ( VPUs ) • Algorithm, HW accelerating

  10. Implementation • Multi chip system connected over two busses • Input and Controls over Main Bus • Output via streamed neighbored busses • Local FIFOs for every chip/class • Classifier for packet management • SW configurable controls • Cluster organized VPUs with in/out arbitration • Watchdogs & Statistics Gathering

  11. Board Level diagram Stratix II 180 Stratix II 180 Stratix II 180 Stratix II 180 Ring Bus Ring Bus LBS1 LBS2 LBS3 LBS4 Classifier MainBus : Data In and Controls DDR2 DDR2 DDR2 DDR2 DDR2 DDR2 DDR2 DDR2 PROCStar II Input vectors Output reports Per LBS registers PCI Bus S/W emulator or H/W DSP system

  12. Single FPGA Top Diagram NIOScluster NIOScluster NIOScluster NIOScluster DDR2 Controls Bank A NIOScluster NIOScluster NIOScluster NIOScluster Bus Control Block Load Balancing Switch (LBS) I/O – LBS Control Block Data flow NIOScluster NIOScluster NIOScluster NIOScluster DDR2 Controls Bank B NIOScluster NIOScluster NIOScluster NIOScluster LBS 1-4 Stratix II 180 FPGA

  13. Data Packet Format …… Header Tail Data 1 to N of 32-bit Words Header : SW/HW Control 1-bit Unused Nios Number Data Length N Vector ID/Command Type Type 1-bit (Data/Command) Version 4-bit 8-bit 16-bit 32-bit Tail : Sync Data

  14. LBS Class Top Level View FIFO Input Port Busses Control Block Input data bus Cluster Arbiter Cluster Arbiter Cluster Arbiter Input Reader NIOS II System NIOS II System NIOS II System Control Main Controller unit Statistics Reporter Control Control and Status Output Writer Control FIFOOutputPort Muxed output data bus Stratix II FPGA

  15. Organization of VPU’s(Vector Processing Units) • NIOS VPUs joined into the clusters • Constant number of Clusters • Parametric number of NIOS VPU’s in cluster • Parametric control & distribution logic • Various configurations of NIOS • Static/Dynamic Priority Arbitration

  16. LBS Units DescriptionVPUs: NIOS System • Single processor with in/out buffers • HW accelerated system • Shared resources system with mutex • Multi- processors system with number of ports to Cluster

  17. Resource Usage Resource usage data for 6 VPU system VPU resource usage is based on basic VPUs and may be decreased by advanced configurations and policies.

  18. Performance of LBS • Theoretical Throughput: 100MHz x 64bit = 6.4Gbit/s • Arbitration and routing latency: 2-4 cycles in average • 60% effective bandwidth utilization for short vectors, up to 98% for long vectors • 1Mbit/s – 400 Mbit/s real throughput • PCI and slow algorithms = bottlenecks

  19. Performance for short vectors Time and throughput for 1000 vectors of 4 chunks each VPU performance is based on basic VPUs and RR arbitration and may be increased for giving workload after perf. analysis by defining advanced configurations and policies.

  20. Performance for medium vectors Time and throughput for 1000 vectors of 200 chunks each VPU performance is based on basic VPUs and RR arbitration and may be increased for giving workload after perf. analysis by defining advanced configurations and policies.

  21. Performance for long vectors Time and throughput for 100 vectors of 1000 chunks each VPU performance is based on basic VPUs and RR arbitration and may be increased for giving workload after perf. analysis by defining advanced configurations and policies.

  22. System performance – missing TOAs

  23. System performance – noise levels

More Related