Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Packet-Switched vs. Time-Multiplexed FPGA Overlay NetworksKapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh

Agenda • Introduction • Background • Topology • Packet Switched • Time Multiplexed • Application • Methodology • Results • Conclusions • Wrap-up • Questions

Introduction • Dedicated spatial interconnect links on a configured FPGA network can be inefficient for sparse communication patterns • Overlaying virtual networks on top of the physical networks can help address this issue

Time-Multiplexed Pros • Can take advantage of global route information Cons • Offline computation can be compute intensive • Must allocate resources for communication schedule and all possible communication between operators

Packet-Switched Pros • No offline setup and resources for storing communication schedule • Routes are made for operators that are actually communicating Cons • Switches more complex • Routes can be less efficient

Novel Contributions of work • Demonstration of efficient and scalable static and dynamic FPGA overlay networks • Quantification of difference between offline scheduling and online routing • Quantification of performance impacts due to balancing interconnects and computing • Characterization of area and performance tradeoffs between time-multiplexed and packet-switched • Quantification of performance difference between time-multiplexed and packet-switched under varying application communication loads.

NoC • Early days – on-chip buses • Later necessary to investigate scalable, high-performance, low-overhead on chip networks • Networks are required since buses scale poorly • As the number of PEs increases the communcation increases and more bandwidth is needed

Communication Patterns • Need to know in order to choose network to use • Configured switching is inefficient for apps that underutilize links • Circuit switching is efficient for larger messages on shorter networks • Need to know characteristics in order to make appropriate choice

Packet Switched How they improve on past work in FPGA-based overlay networks • Allow arbitrary topolgies • Use real applications and relistic PE architectures to generate traffic payloads • Network speed is much faster running at 166 MHz as compared to most running at 25-50 MHz

Time Multiplexed • Use a greedy router similar to the one used in the Virtual Wires project • Virtual Wires overcame pin limitation by time sharing each physical wire among logical wires and pipelining • This paper attempts to explore the entire design space as opposed to one system size or config

Performance Analysis Several important quantities of the network have to be defined • PE Input Serialization A bound of cycle count for input • PE Output Serialization A bound of cycle count for output • Network Bisection Maximum number of messages that can cross the network on a given cycle • Network Latency Number of cycles required to cross the network

Butterfly Fat Trees • Most FPGA NoCs have focused on meshes • BFTs achieve higher performance at equivalent chip size • Routing functions programmed in the split primitives determine path • Single address bit is used to make a routing decision at each switch • Time-multiplexed merge contains a context memory which stores computed routing

Packet Switched • Primitives have input queues • Split primitives computes the routing decision in a single cycle based on the destination address • Arbitration is done by selecting packets based on input queue occupancies • Network with floorplaned and pipelined primitives can operate as high as 180 MHz

Time Multiplexed • Statically scheduled prior to runtime • Switching primitives contain context memory • Context memory requires 1 bit of storage per cycle • Network capable of operating at 166 MHz • Greedy routing algorithm used

Area and Latency of Switching

Application • A real life application was mapped onto the networks • ConceptNet – common-sense reasoning knowledge base represented as a graph • Start with a inititial set of nodes, send activation from each node to it’s neighbors along weighed edges • Time multiplexed run at 100% activity packet switched run between 1-100% activity level • Limitations • Nodes limited to 128 edges of fanout or fanin • Can only process a single edge per cycle

Methodology • Java based infrastructure • simulates the packet switched network • computes schedules for time multiplexed network • Used smallest set of ConceptNet predicates • Java infrastructure generates VHDL netlist • Hand coded VHDL for ConceptNet PEs • Created custom multipliers instead of using onboard for speed

Methodology (cont) • Synthesis and place and routing using Synplicity Compiler v8.0 • Xilinx ISE v8.1i to obtain operating frequency and slice count • Long wires that constrain performance are further pipelined based on post place-and-route timing analysis • Lots of intervention to prepare system

Results Three quantitative comparisons are provided to characterize the tradeoffs between packet switched and time multiplexed networks • Routing of identical topologies • Impact of area with identical area constraints • Examine performance while varying activity level (Activity Factors)

Routing identical topologies • Small numbers of PEs induce a light communication load • As PEs ⁭ , communication ⁭ and offline routing starts to outperform online routing • Online routing requires up to 63% more cycles than offline routing for larger networks

Impact of Area • A couple of things to consider when talking about area • PE vs. Interconnect Tradeoff • Area-Time Tradeoff

PE vs. Interconnect Tradeoff Sometimes the network performs better with less PEs but more capacity in the network.

Area-Time Tradeoff • Packet switched and time multiplexed networks may use significantly different amounts of area due to differences in switch sizes • At smaller areas time multiplexing requires more cycles • At higher cycle counts time multiplexing requires more area for context • Performance is limited by 128 edge fanin or fanout limit

Activity Factors • Packet-switching takes 8x as many cycles to route • At some activity factors less than 100% packet-switching should be able to outperform time-multiplexing for same area

Conclusions • Demonstrated implementations of packet-switched and time-multiplexed FPGA overlay networks operating at 166 MHz • Offline scheduling offers up to a 63% performance increase over online scheduling for equivalent topologies • Packet-switching is up to 2x faster for small areas • Time-multiplexing is up to 8x faster for large areas

Conclusions (cont.) For activity factors less than 30% or 5%, packet switching offers better performance At 32K slices and 100K slices respectively

Future Work • Mapping larger communication graphs with smaller fanout limitations to fully test networks • Compress context memory for time-multiplexing • Improve efficiency of packet switching • Extend work to multiple-chip networks

Wrap-up • Paper takes a look at trade-offs involved in FPGA networks • Thought it was a good look at design decisions and gave actual guidance to the designer • Describes interesting alternative to mesh network (BFTs)

!? Questions ?!

Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al

Presentation Transcript

TCP/IP Performance across Optical Packet-Switched (OPS) Networks

Overlay/P2P Networks

Modified Cell Delineation Strategy for Packet Switched Networks

Practical Considerations for Smoothing Multimedia Traffic over Packet-Switched Networks

Language Modeling and Encryption on Packet Switched Networks

Resilient Overlay Networks

CS5412 : Overlay Networks

TTY Transport Over Packet-Switched Networks for Users

Legacy and Voice over Packet Switched Networks

Overlay Networks and Overlay Multicast

Resilient Overlay Networks

Videoconferencing over Packet Switched Networks

Overlay Network and Packet header

All-Optical Header Processing in Optical Packet-Switched Networks

Packet-Switched Networks

Resilient Overlay Networks

WAN – Packet and Cell-switched Networks

Circuit Switched vs. Packet Switched Technology

Resilient Overlay Networks

Internetworking: Voice over packet-switched networks and IP over X