1 / 146

Scheduling algorithms for input-queued IP routers

This presentation discusses various scheduling algorithms for input-queued IP routers, including optimal, heuristic, and packet-mode algorithms. It also covers topics such as networks of routers, CIOQ routers, and multicast traffic.

ifowler
Download Presentation

Scheduling algorithms for input-queued IP routers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scheduling algorithms for input-queued IP routers Emilio Leonardi in collaboration with: P. Giaccone, M. Ajmone Marsan, A Bianco, M.Mellia, F.Neri Dipartimento di Elettronica Telecommunication Network Group http://www.tlc-networks.polito.it Politecnico di Torino (Italy) Budapest, March 2006

  2. Outline • IP routers • OQ routers • IQ routers • Scheduling • Optimal algorithms • Heuristic algorithms • Packet-mode algorithms • Networks of routers • CIOQ routers • Multicast traffic • Conclusions

  3. Note The slides marked RWP are reproduced with permission of Prof.Nick McKeown from the Electrical Engineering and Computer Science Dept. of Stanford University (CA,USA)

  4. Outline • IP routers • OQ routers • IQ routers • Scheduling • Optimal algorithms • Heuristic algorithms • Packet-mode algorithms • Networks of routers • CIOQ routers • Multicast traffic • Conclusions

  5. “The Internet is a mesh of routers” core router access router enterprise router

  6. “The Internet is a mesh of routers” Access router: • high number of ports at low speed (kbps/Mbps) • several access protocols (modem, ADSL, cable) Enterprise router: • medium number of ports at high speed (Mbps) • several services (IP classification, filtering) Core router: • moderate number of ports at very high speed (Mbps/Gbps) • very high throughput

  7. Basic functions • Routing • computation of the output port of an incoming packet • uses the routing tables computed by the routing protocols • can be a complex procedure: • very large routing tables • dynamic variation of routes in the Internet

  8. Basic functions • Switching • transfer of packets from input ports to output ports • solution of the contentions for output ports • queueing • where to store • scheduling • what to transfer

  9. Faster and faster • Need for high performance routers • to accommodate the bandwidth demands for new users and new services • to support QoS • to reduce costs

  10. Packet processing and link speed ? • Increase of electronic packet processing power cannot accommodate the increase in link speed Packet processing Power Link Speed 10000 1000 2x / 7 months Moore’s law 2x / 18 months 100 Fiber Capacity (Gbit/s) 10 1 1985 1990 1995 2000 0,1 TDM DWDM Source: SPEC95Int & David Miller, Stanford. RWP

  11. Memory access time 1.1x / 18 months Moore’s Law 2x / 18 months RWP

  12. Moore’s law It’s hard to keep up with Moore’s law: • the bottleneck is memory speed Moore’s law is too slow: • routers need to improve fasterthan Moore’s law RWP

  13. Router capacity exceeds Moore’s law Growth in capacity of commercial routers: • 1992 ~ 2 Gb/s • 1995 ~ 10 Gb/s • 1998 ~ 40 Gb/s • 2001 ~ 160 Gb/s • 2003 ~ 640 Gb/s Average growth rate: 2.2x / 18 months RWP

  14. Single packet processing • The time to process one packet is becoming shorter and shorter • worst case: 40-Byte packets (ACKs) travelling over the Internet • 3.2 s at 100 Mbps • 320 ns at 1 Gps • 32 ns at 10 Gps • 3.2 ns at 100 Gbps • 320 ps at 1Tbps

  15. Hardware architecture physical structure logical structure

  16. Hardware architecture Main elements • line cards • support input/output transmissions • store packets • adapt packets to the internal format of the switching fabric • support data link protocols • classify packets • schedule packets • support security • switching fabric • transfers packets from input ports to output ports

  17. Hardware architecture Main elements • control processor/network processor • runs routing protocols • computes routing tables • manages the overall system • forwarding engines • compute the packet destination (lookup) • inspect packet headers • rewrite packet headers

  18. Interconnections among main elements - I control processor switching fabric forwarding engine forwarding engine line card line card 1 N

  19. Interconnections among main elements - II control processor switching fabric line card & forwarding engine line card & forwarding engine 1 N

  20. Cell-based routers ISM: Input-Segmentation Module ORM: Output-Reassembly Module packet: variable-size data unit cell: fixed-size data unit ORM ISM 1 ISM ORM N Cell switch (fabric) cells packets packets cells 1 N

  21. Switching fabric • Our assumptions: • bufferless • to reduce internal hardware complexity • non-blocking • it is always possible to transfer in parallel from input to output ports any non-conflicting set of cells

  22. Switching fabric 1 2 inputs 3 4 4 2 3 1 outputs • Examples: • crossbar • rearrangeable Clos network • Benes network • Batcher-Banyan network (self-routing) • Switching constraints • at most one cell for each input and for each output can be transferred

  23. Switching fabric • We do not discuss switching fabrics with internal buffers • e.g.: crossbars with buffer at each crosspoint

  24. Generic switching architecture Sin Sout Output 1 Input 1 switching fabric Sout Sin Output N Input N output queues input queues

  25. Speedup • The speedup determinates the switch performance: • Sin = reading speed from input queues • Sout = writing speed to output queues • maximum speedup factor: S = max(Sin,Sout)

  26. Performance comparison • The performance of different switching systems can be studied • with analytical models • introducing simplifying assumptions, but obtaining general results • with simulation models • obtaining more detailed results

  27. Traffic description • Aij(n) = 1 if a packet arrives at time n at input i, with destination reachable through output j • ij = E[Aij(n)] • An arrival process is admissible if: • i ij  1 • j ij  1 • that is, no input and no output are overloaded on average • note that OQ switches exhibit finite delays only for admissible traffic • traffic matrix:  = [ij ]

  28. Traffic scenarios • Uniform traffic • Bernoulli i.i.d. arrivals • usual testbed in the literature • “easy to schedule” • Diagonal traffic • Bernoulli i.i.d arrivals • critical to schedule, since only two matchings are good

  29. Traffic scenarios • LogDiagonal traffic • Bernoulli i.i.d. arrivals • more critical than uniform,less than diagonal traffic

  30. Outline • IP routers • OQ routers • IQ routers • Scheduling • Optimal algorithms • Heuristic algorithms • Packet-mode algorithms • Networks of routers • CIOQ routers • Multicast traffic • Conclusions

  31. Output Queued (OQ) switches • Sin = 1 Sout = N • used for low bandwidth routers • no coordination among ports • work-conserving • best average delays • complete control of delays • support of QoS scheduling

  32. Output Queued (OQ) switch speedup N Output 1 Input 1 switching fabric Output N Input N

  33. OQ performance Uniform traffic Note: OQ is optimal from the point of view of average delay and throughput OQ

  34. Outline • IP routers • OQ routers • IQ routers • Scheduling • Optimal algorithms • Heuristic algorithms • Packet-mode algorithms • Networks of routers • CIOQ routers • Multicast traffic • Conclusions

  35. Simple Input Queued (IQ) switches Input 1 Output 1 switching fabric Input 1 Output N • Sin = 1 Sout = 1 • 1 FIFO queue for each input port • throughput limitations • due to head of the line (HOL) blocking • scheduling • to solve contentions for the same output

  36. Head of the Line (HOL) Blocking RWP

  37. Simple IQ switch performance Uniform traffic Simple IQ OQ

  38. Improving simple IQ switches • Window/bypass schedulers • the first w cells of each queue contend for outputs • HOL blocking is reduced, not eliminated • w = 1 means FIFO at each input • higher complexity • the scheduler deals with wN cells • non-FIFO queues

  39. Improving IQ switches • Virtual output queueing (VOQ) • one queue for each input/output pair • N queues at each input • N2 queues in the whole switch • eliminates HOL blocking • used in high-bandwidth routers • scheduling implemented in hardware at very high speed

  40. IQ switches with VOQ input constraints output constraints 1 Input 1 Output 1 switching fabric N 1 Input N Output N N scheduler Note: from now on, we always assume VOQ at the switch inputs

  41. Outline • IP routers • OQ routers • IQ routers • Scheduling • Optimal algorithms • Heuristic algorithms • Packet-mode algorithms • Networks of routers • CIOQ routers • Multicast traffic • Conclusions

  42. Scheduling in IQ switches • Scheduling can be modeled as a matching problem in a bipartite graph • the edge from node i to node j refers to packets at input i and directed to output j • the weight of the edge can be • binary (not empty/empty queue) • queue length • HOL cell waiting time, or cell age • some other metric indicating the priority of the HOL cell to be served

  43. Scheduling in IQ switches Request Graph Matching (or Permutation) inputs outputs scheduler

  44. Scheduling in IQ switches scheduler Request Matrix 3 5 0 0 2 0 0 4 4 5 0 0 0 0 8 2 Permutation 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0

  45. Implementing schedulers • Scheduling is a complex task • a scheduling algorithm can be implemented in hardware if: • it shows good performance for a wide range of traffic patterns • it can be efficiently parallelized • it can be efficiently pipelined • it requires few iterations (or clock cycles) • it requires limited control information

  46. Scheduling uniform traffic • A number of algorithms give 100% throughput when traffic is uniform • For example: • TDM and a few variants • iSLIP (see later) Example of TDM for a 4x4 switch RWP

  47. Birkhoff - von Neumann theorem Any doubly stochastic matrix L can be expressed as convex combination of permutation matrices pn: L = n an pn with an≥0 n an =1

  48. Scheduling non-uniform traffic • thanks to the Birkhoff - von Neumann theorem • If the traffic is known and admissible, 100% throughput can be achieved by a TDM using: • for a fraction of time a1 matching M1 (p1) • for a fraction of time a2 matching M2 (p2) • for a fraction of time ak matching Mk (p3)

  49. Outline • IP routers • OQ routers • IQ routers • Scheduling • Optimal algorithms • Heuristic algorithms • Packet-mode algorithms • Networks of routers • CIOQ routers • Multicast traffic • Conclusions

  50. Maximum Size Matching • Maximum Size Matching (MSM) • among all the possible matchings, selects the one with the highest number of edges • MSM is generally not unique • the best MSM algorithm requires O(N2.5) iterations, and cannot be implemented efficiently, since it is based on a flow augmentation path algorithm

More Related