280 likes | 294 Views
This chapter provides an overview of the design principles and tradeoffs in creating a switching fabric for network processing systems, with a focus on scalability and high throughput. Various fabric types, such as shared bus, shared memory, fully interconnected, crossbar, and multistage fabrics, are discussed. Queuing strategies, including input queueing, output queueing, and virtual queuing, are also explored.
E N D
ECE 526 – Network Processing Systems Design Switching Fabric D. E. Comer: chapter 10
Overview • Computation • Single CPU is not fast enough for processing packets • Multiple optimized processors (e.g. Network Processors) • Multiple ports with each port powered by NPs • Storage • Memory is bottleneck (memory wall) • On-chip, heterogeneous memories • Multi-threading to hide long memory access • Communication • Bus: data, control and address • Memory: can also used for communication ECE 526
Port Network Processor e Processing Processing c a Engine Engine f r e t n Interconnect I k Processing r o Processing Engine w Engine t e N I / O Router Data Path • How can ports be connected? • “System backplane” • Switching Fabric: • Hardware mechanism serving as backplane by providing path for data flow among the ports ECE 526
Switching Fabric Requirement • Connectivity: • interconnection between ports and control processor • High throughput • Low cost • Scalability • Scale with data rate on input/output (one port) • Scale with packet rate on input/output (one port) • Scale with number of input/output ports • Feature • Support transfer of unicast, multicast, and broadcast packets Conflicting requirements: tradeoffs necessary ECE 526
Switching Fabric Types • Synchronous vs. asynchronous • Sync: regular traffic (e.g., fixed cells); async: variable size (e.g., packets) • In practice synchronous fabrics are used • Packets are divided into fixed-sized cells • Multiplexing: • Time Division: single or few paths shared among many ports • Space Division: many paths used for less delay • Stage: • Single or multiple ECE 526
Switching Fabric Design Spectrum • Design spectrum • Evaluation criteria • Low cost • Performance • Scalability • Design option • Data transferred granularity • Queuing for contention ECE 526
Shared Bus Fabric • Disadvantage: lower aggregate data rate–Hardware needs to operate at N times faster than ports • What granularity is best for access: packet, cells? ECE 526
Granularity for Bus • Packet • Pro.: Simple hardware • Con.: delay- small packet suffers waiting for large pkt • Fixed size block • Pro.: multiple transfer concurrently • Con.: overhead from switching senders ECE 526
Shared Memory Fabric • Input ports deposit packets in memory • Problem: cost of memory interfaces ECE 526
Fully Interconnected Fabric • Separate physical data path between each port pair • Advantage and disadvantage? ECE 526
Fully Interconnected Fabric • Advantage • possible N concurrently transfers • Disadvantage • Port contention: two input port to the same output simultaneously • worst case: only one transfer • Expensive: • N*M wires • Output port needs contention circuitry • Question: is it possible to N concurrently transfers but not N*M wires • Observation: for each output port, only one of N wires is used at anytime ECE 526
Crossbar Fabric • Switched interconnect • Allows multiple Simultaneous connections • Controller decides on assignment of active connections • Pros and cons? ECE 526
Crossbar Fabric • Pros: • Using N + M wires to achieve maximum simultaneous transfers up to Min(N,M) • High aggregate data rate • Cons: N * M switches ECE 526
Contention • What happens if multiple input ports send to the same output port? • Causes “port contention” • Is this a problem of the fabric architecture? • How can contention be addressed? • Queues that buffer packets ECE 526
Input Queueing • Incoming packet traffic is burst • Head of line blocking • The first packet in the queue to output port s, which is busy • The second packet in the queue to output port t, which is idle • Solution for head of line blocking • Multiple queues for each port • Change queue from FCFS to randomly access • Virtual queueing: • based on estimate of processing time to reserve packet position in the queue • requires complex coordination • What is right size of input queue • Upper bound: port rate ECE 526
Output Queueing • Bursts from inputs to one output are buffered on output side • Queue size • Too small -> packet dropped • Too large -> keeping packet are longer valid • Voice packet arrived to destination too late • TCP packet already retransmitted ECE 526
Queuing Summary • Variety of queuing designs possible • Input queuing: • Packets are buffered on input side until output is available • Problem: head of line blocking • Output queuing: • Bursts from inputs to one output are buffered on output side • Problem: queue size, switch fabric speedup • Virtual queuing: • Partial output queues are maintained on input size • Each port gets rate or virtual time assigned for sending • Problem: requires complex coordination ECE 526
Multistage Fabrics • None of the fabrics so far really scales • Full interconnect: squared cost with port number • Crossbar: limitations in controller • Bus: interconnect needs to be N times faster than port • Shared Memory: memory interfaces are costly and not scalable • Multistage fabrics: • Multiple “steps” between input and output • Provides multiple data paths, but not as many as fully connected • Packets can be queued or not on each stage • Nice properties: • Scalability • Self routing ECE 526
Switching Elements • Switching element is basic component: • Two inputs, two outputs • Two different settings • Straight through • Crossover • Setting is determined by “label” • Multiple switching elements make up larger fabric • Various topologies possible: • Banyan • Benes • Delta • Etc. ECE 526
Banyan Switch • From 2-port switch to 4-port switch: • Can be applied recursively ECE 526
Banyan Switch • 8-port: • Recursive extension • Self-routing: • Based on output port “label” each stage can make local decision • Recursive structure will lead to correct destination ECE 526
Nice Feature of Banyan • Scalability • Can be operated at lower rate than a shared bus • Applying to arbitrary number of input ports • Modular design: cheap 2-input switch • Self-routing • Each output port including internal port is identified by a unique binary label • Sender prefixes the appropriate label on the packet • At each stage, switch extract the next bit and use it to determines the path • Internal packet queues allowed but not required • With queues will increase throughput, packet will move forward as long as it allowed • Without queues is cheaper, path established first, then send it ECE 526
Banyan Switch • Scalability due to recursive structure • Contention can occur • How? • Usually due to overloading of one output • Other multistage switch fabrics: • Different recursive rules • Additional stages: distribution to avoid blocking • Practical note: • Standards define Switch Fabric Interfaces ECE 526
Exercises 1 • Draw a banyan network for 8 input with full details, and label intermediate connection with label prefix to reach to the connection. ECE 526
Exercise 2 • Derive the number of stage required in a Banyan switch with 2n inputs. How many of 2-input switch required? How many queues required if queues allowed? ECE 526
Exercises 3 • What is the maximum number of transfers that can occur in a 8-input Banyan switch in the best case? Does the answer changes with or without buffers? (transfers defines as a connection from input of switch to its output) • What is the maximum number of transfers that can occur in a 8-input Banyan switch in the worst case? Does the answer changes with or without buffers? ECE 526
Summary • Switching fabric provides connections inside single network system • Two basic approaches – Time-division has lowest cost – Space-division has highest performance • Multistage designs compromise between two • Banyan fabric is example of multistage