1.46k likes | 1.47k Views
This presentation discusses various scheduling algorithms for input-queued IP routers, including optimal, heuristic, and packet-mode algorithms. It also covers topics such as networks of routers, CIOQ routers, and multicast traffic.
E N D
Scheduling algorithms for input-queued IP routers Emilio Leonardi in collaboration with: P. Giaccone, M. Ajmone Marsan, A Bianco, M.Mellia, F.Neri Dipartimento di Elettronica Telecommunication Network Group http://www.tlc-networks.polito.it Politecnico di Torino (Italy) Budapest, March 2006
Outline • IP routers • OQ routers • IQ routers • Scheduling • Optimal algorithms • Heuristic algorithms • Packet-mode algorithms • Networks of routers • CIOQ routers • Multicast traffic • Conclusions
Note The slides marked RWP are reproduced with permission of Prof.Nick McKeown from the Electrical Engineering and Computer Science Dept. of Stanford University (CA,USA)
Outline • IP routers • OQ routers • IQ routers • Scheduling • Optimal algorithms • Heuristic algorithms • Packet-mode algorithms • Networks of routers • CIOQ routers • Multicast traffic • Conclusions
“The Internet is a mesh of routers” core router access router enterprise router
“The Internet is a mesh of routers” Access router: • high number of ports at low speed (kbps/Mbps) • several access protocols (modem, ADSL, cable) Enterprise router: • medium number of ports at high speed (Mbps) • several services (IP classification, filtering) Core router: • moderate number of ports at very high speed (Mbps/Gbps) • very high throughput
Basic functions • Routing • computation of the output port of an incoming packet • uses the routing tables computed by the routing protocols • can be a complex procedure: • very large routing tables • dynamic variation of routes in the Internet
Basic functions • Switching • transfer of packets from input ports to output ports • solution of the contentions for output ports • queueing • where to store • scheduling • what to transfer
Faster and faster • Need for high performance routers • to accommodate the bandwidth demands for new users and new services • to support QoS • to reduce costs
Packet processing and link speed ? • Increase of electronic packet processing power cannot accommodate the increase in link speed Packet processing Power Link Speed 10000 1000 2x / 7 months Moore’s law 2x / 18 months 100 Fiber Capacity (Gbit/s) 10 1 1985 1990 1995 2000 0,1 TDM DWDM Source: SPEC95Int & David Miller, Stanford. RWP
Memory access time 1.1x / 18 months Moore’s Law 2x / 18 months RWP
Moore’s law It’s hard to keep up with Moore’s law: • the bottleneck is memory speed Moore’s law is too slow: • routers need to improve fasterthan Moore’s law RWP
Router capacity exceeds Moore’s law Growth in capacity of commercial routers: • 1992 ~ 2 Gb/s • 1995 ~ 10 Gb/s • 1998 ~ 40 Gb/s • 2001 ~ 160 Gb/s • 2003 ~ 640 Gb/s Average growth rate: 2.2x / 18 months RWP
Single packet processing • The time to process one packet is becoming shorter and shorter • worst case: 40-Byte packets (ACKs) travelling over the Internet • 3.2 s at 100 Mbps • 320 ns at 1 Gps • 32 ns at 10 Gps • 3.2 ns at 100 Gbps • 320 ps at 1Tbps
Hardware architecture physical structure logical structure
Hardware architecture Main elements • line cards • support input/output transmissions • store packets • adapt packets to the internal format of the switching fabric • support data link protocols • classify packets • schedule packets • support security • switching fabric • transfers packets from input ports to output ports
Hardware architecture Main elements • control processor/network processor • runs routing protocols • computes routing tables • manages the overall system • forwarding engines • compute the packet destination (lookup) • inspect packet headers • rewrite packet headers
Interconnections among main elements - I control processor switching fabric forwarding engine forwarding engine line card line card 1 N
Interconnections among main elements - II control processor switching fabric line card & forwarding engine line card & forwarding engine 1 N
Cell-based routers ISM: Input-Segmentation Module ORM: Output-Reassembly Module packet: variable-size data unit cell: fixed-size data unit ORM ISM 1 ISM ORM N Cell switch (fabric) cells packets packets cells 1 N
Switching fabric • Our assumptions: • bufferless • to reduce internal hardware complexity • non-blocking • it is always possible to transfer in parallel from input to output ports any non-conflicting set of cells
Switching fabric 1 2 inputs 3 4 4 2 3 1 outputs • Examples: • crossbar • rearrangeable Clos network • Benes network • Batcher-Banyan network (self-routing) • Switching constraints • at most one cell for each input and for each output can be transferred
Switching fabric • We do not discuss switching fabrics with internal buffers • e.g.: crossbars with buffer at each crosspoint
Generic switching architecture Sin Sout Output 1 Input 1 switching fabric Sout Sin Output N Input N output queues input queues
Speedup • The speedup determinates the switch performance: • Sin = reading speed from input queues • Sout = writing speed to output queues • maximum speedup factor: S = max(Sin,Sout)
Performance comparison • The performance of different switching systems can be studied • with analytical models • introducing simplifying assumptions, but obtaining general results • with simulation models • obtaining more detailed results
Traffic description • Aij(n) = 1 if a packet arrives at time n at input i, with destination reachable through output j • ij = E[Aij(n)] • An arrival process is admissible if: • i ij 1 • j ij 1 • that is, no input and no output are overloaded on average • note that OQ switches exhibit finite delays only for admissible traffic • traffic matrix: = [ij ]
Traffic scenarios • Uniform traffic • Bernoulli i.i.d. arrivals • usual testbed in the literature • “easy to schedule” • Diagonal traffic • Bernoulli i.i.d arrivals • critical to schedule, since only two matchings are good
Traffic scenarios • LogDiagonal traffic • Bernoulli i.i.d. arrivals • more critical than uniform,less than diagonal traffic
Outline • IP routers • OQ routers • IQ routers • Scheduling • Optimal algorithms • Heuristic algorithms • Packet-mode algorithms • Networks of routers • CIOQ routers • Multicast traffic • Conclusions
Output Queued (OQ) switches • Sin = 1 Sout = N • used for low bandwidth routers • no coordination among ports • work-conserving • best average delays • complete control of delays • support of QoS scheduling
Output Queued (OQ) switch speedup N Output 1 Input 1 switching fabric Output N Input N
OQ performance Uniform traffic Note: OQ is optimal from the point of view of average delay and throughput OQ
Outline • IP routers • OQ routers • IQ routers • Scheduling • Optimal algorithms • Heuristic algorithms • Packet-mode algorithms • Networks of routers • CIOQ routers • Multicast traffic • Conclusions
Simple Input Queued (IQ) switches Input 1 Output 1 switching fabric Input 1 Output N • Sin = 1 Sout = 1 • 1 FIFO queue for each input port • throughput limitations • due to head of the line (HOL) blocking • scheduling • to solve contentions for the same output
Simple IQ switch performance Uniform traffic Simple IQ OQ
Improving simple IQ switches • Window/bypass schedulers • the first w cells of each queue contend for outputs • HOL blocking is reduced, not eliminated • w = 1 means FIFO at each input • higher complexity • the scheduler deals with wN cells • non-FIFO queues
Improving IQ switches • Virtual output queueing (VOQ) • one queue for each input/output pair • N queues at each input • N2 queues in the whole switch • eliminates HOL blocking • used in high-bandwidth routers • scheduling implemented in hardware at very high speed
IQ switches with VOQ input constraints output constraints 1 Input 1 Output 1 switching fabric N 1 Input N Output N N scheduler Note: from now on, we always assume VOQ at the switch inputs
Outline • IP routers • OQ routers • IQ routers • Scheduling • Optimal algorithms • Heuristic algorithms • Packet-mode algorithms • Networks of routers • CIOQ routers • Multicast traffic • Conclusions
Scheduling in IQ switches • Scheduling can be modeled as a matching problem in a bipartite graph • the edge from node i to node j refers to packets at input i and directed to output j • the weight of the edge can be • binary (not empty/empty queue) • queue length • HOL cell waiting time, or cell age • some other metric indicating the priority of the HOL cell to be served
Scheduling in IQ switches Request Graph Matching (or Permutation) inputs outputs scheduler
Scheduling in IQ switches scheduler Request Matrix 3 5 0 0 2 0 0 4 4 5 0 0 0 0 8 2 Permutation 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0
Implementing schedulers • Scheduling is a complex task • a scheduling algorithm can be implemented in hardware if: • it shows good performance for a wide range of traffic patterns • it can be efficiently parallelized • it can be efficiently pipelined • it requires few iterations (or clock cycles) • it requires limited control information
Scheduling uniform traffic • A number of algorithms give 100% throughput when traffic is uniform • For example: • TDM and a few variants • iSLIP (see later) Example of TDM for a 4x4 switch RWP
Birkhoff - von Neumann theorem Any doubly stochastic matrix L can be expressed as convex combination of permutation matrices pn: L = n an pn with an≥0 n an =1
Scheduling non-uniform traffic • thanks to the Birkhoff - von Neumann theorem • If the traffic is known and admissible, 100% throughput can be achieved by a TDM using: • for a fraction of time a1 matching M1 (p1) • for a fraction of time a2 matching M2 (p2) • for a fraction of time ak matching Mk (p3)
Outline • IP routers • OQ routers • IQ routers • Scheduling • Optimal algorithms • Heuristic algorithms • Packet-mode algorithms • Networks of routers • CIOQ routers • Multicast traffic • Conclusions
Maximum Size Matching • Maximum Size Matching (MSM) • among all the possible matchings, selects the one with the highest number of edges • MSM is generally not unique • the best MSM algorithm requires O(N2.5) iterations, and cannot be implemented efficiently, since it is based on a flow augmentation path algorithm