1 / 57

iSLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002

iSLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002. Table of Contents. The place Buffer in Crossbar Switches Example of Fabrics PIM iSLIP (in CISCO 12000 ,5Gb/s router and Tiny Tera 0.5 Tb/s) RRM WFA PP_VOQ Multicasting A 2.5Tb/s Router. The place of Buffer in Crossbar.

Download Presentation

iSLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. iSLIP Switch SchedulerAli Mohammad Zareh BidokiApril 2002

  2. Table of Contents • The place Buffer in Crossbar Switches • Example of Fabrics • PIM • iSLIP (in CISCO 12000 ,5Gb/s router and Tiny Tera 0.5 Tb/s) • RRM • WFA • PP_VOQ • Multicasting • A 2.5Tb/s Router

  3. The place of Buffer in Crossbar • Output Buffer • Shared Buffer • Input buffer

  4. InterconnectsTwo basic techniques Input Queueing Output Queueing Usually a non-blocking switch fabric (e.g. crossbar) Usually a fast bus

  5. Memory b/w = 2R InterconnectsInput Queueing with Crossbar Arbiter Data In Data Out configuration

  6. 58.6% Input QueueingHead of Line Blocking Delay Load 100%

  7. Head of Line Blocking

  8. Crossbar Switch fabric Virtual output Queuing Queue scheduler To port 1 To port 2 To port n To port 1 To port 2 To port n To port 1 To port 2 To port n Input queues To port 1 To port n Port 2 queue Port 1 queue Port n queue Input port 1

  9. Input QueueingVirtual output queues

  10. Input QueueingVirtual Output Queues Delay Load 100%

  11. Which is better? • Virtual output Queue (input queue). • Ideal Output queue.

  12. Complex! Input QueueingVirtual output queues Arbiter

  13. VOQ • Arbiter • Input memory management

  14. Problem Definition (bipartite)

  15. Maximum or Maximal matching

  16. Maximum or Maximal matching • Maximum matching • Maximizes instantaneous throughput • Starvation • Time complexity is very high in Hardware (o(n3)) • Maximal matching • Can’t add any connection on the current match without alert existing connections • More practical (e.g. WFA, PIM, iSLIP, DRR,RRM)

  17. Matching Algorithms 3. iSLIP – Iterative Serial-Line IP(base on PIM and RRM) 2. RRM – Round-Robin Matching 1. PIM - Parallel Iterative Matching We will discuss three different matching algo.: Each algo. is evaluated by four parameters: 1. Latency(Throughput). 2. Starvation free. 3. Fast. 4. Implementation.

  18. PIM - Parallel Iterative Matching When no new matching can be found, the algorithm stops. 3. Accept - If an input receives a grant, it accepts one by selecting an output randomly among those that granted to this output.. 2. Grant - If an unmatched output receives any requests, it grants to one by randomly selecting a request uniformly over all requests. 1. Request - Each unmatched input sends a request to every output for which it has a queued cell. The basic matching algorithm. Each iteration of the algorithm follows these three steps:

  19. PIM • Each iteration will eliminate at least ¾ of the remaining connections • Converge in O(logN) iterations • No input queue is starved if service • No memory or state is used • At the beginning of each cell time, the match begins over, independently of the matches that were made in previous cell times • PIM does not perform well for a single iteration: it limits the throughput to approximately 63%, only slightly higher than for a FIFO switch. • This is because the probability that an input will remain ungranted is (N-1/N)N , hence as N increases, the throughput tends to .63% (1-(1/e)) • Implementation is hard in Hardware

  20. a1 g1 4 1 4 1 3 2 3 2 g2 4 1 3 2 a3 4 1 3 2 g4 a4 4 1 4 1 3 2 3 2 • RRM – Round-Robin Matching 1. Request - Each unmatched input sends a request to every output for which it has a queued cell. 2. Grant - If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output notifies each input whether or not its request was granted. The pointer gi to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input.

  21. a1 4 1 4 1 3 2 3 2 g1 4 1 3 2 a3 g2 4 1 3 2 g4 a4 4 1 4 1 3 2 3 2 • RRM – Round-Robin Matching 3. Accept - If an input receives a grant, it accepts the one that appears next in a fixed, round-robin schedule starting from the highest priority element. 2. Grant - If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output notifies each input whether or not its request was granted. 1. Request - Each unmatched input sends a request to every output for which it has a queued cell. The pointer ai to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the accepted output. The pointer gi to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input.

  22. 4 1 4 1 3 2 3 2 a1 g1 4 1 3 2 a3 g2 4 1 3 2 g4 a4 4 1 4 1 3 2 3 2 • RRM – Round-Robin Matching 3. Accept - If an input receives a grant, it accepts the one that appears next in a fixed, round-robin schedule starting from the highest priority element. 2. Grant - If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output notifies each input whether or not its request was granted. 1. Request - Each unmatched input sends a request to every output for which it has a queued cell. The pointer ai to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the accepted output. The pointer gi to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input.

  23. 1 1 1 1 1 1 3 3 3 3 3 3 2 2 2 2 2 2 • RRM – Round-Robin Matching The RRM is not starvation free: In the following example, we assume there are always cells waiting to be transferred. The destination is always the same. a1 g1 First cycle a2 g2 g3 a3

  24. 1 1 1 1 1 1 3 3 3 3 3 3 2 2 2 2 2 2 • RRM – Round-Robin Matching The RRM is not starvation free: In the following example, we assume there are always cells waiting to be transferred. The destination is always the same. a1 First cycle a2 g1 a3 g2 g3

  25. 1 1 1 1 1 1 3 3 3 3 3 3 2 2 2 2 2 2 • RRM – Round-Robin Matching The RRM is not starvation free: In the following example, we assume there are always cells waiting to be transferred. The destination is always the same. First cycle a1 g1 a2 a3 g2 g3

  26. 1 1 1 1 1 1 3 3 3 3 3 3 2 2 2 2 2 2 • RRM – Round-Robin Matching The RRM is not starvation free: In the following example, we assume there are always cells waiting to be transferred. The destination is always the same. First cycle a1 g2 g1 a2 a3 g3

  27. 1 1 1 1 1 1 3 3 3 3 3 3 2 2 2 2 2 2 • RRM – Round-Robin Matching The RRM is not starvation free: In the following example, we assume there are always cells waiting to be transferred. The destination is always the same. First cycle a1 g2 g1 a2 a3 g3

  28. 1 1 1 1 1 1 3 3 3 3 3 3 2 2 2 2 2 2 • RRM – Round-Robin Matching The RRM is not starvation free: In the following example, we assume there are always cells waiting to be transferred. The destination is always the same. Second cycle a1 g2 g1 a2 a3 g3

  29. 1 1 1 1 1 1 3 3 3 3 3 3 2 2 2 2 2 2 • RRM – Round-Robin Matching The RRM is not starvation free: In the following example, we assume there are always cells waiting to be transferred. The destination is always the same. Second cycle a1 g1 a2 g2 a3 g3

  30. 1 1 1 1 1 1 3 3 3 3 3 3 2 2 2 2 2 2 • RRM – Round-Robin Matching The RRM is not starvation free: In the following example, we assume there are always cells waiting to be transferred. The destination is always the same. a1 Second cycle g1 a2 g2 a3 g3 At this point the sequence of the events will repeat itself: Outputs 1 and 3 will always grant input 1, while output 2 will always grant input 1 at the first iteration of the first cycle, but input 1 will select output 1 indefinitely, leaving output 2 to grant either input 2 or input 3. Thus the cell from input 1 to output 2 will never be granted. In order to solve this starvation the iSlip algorithm was developed.

  31. RRM • RRM overcomes two problem • Complexity • Unfairness • the round-robin arbiters are much simpler and can perform faster than random arbiters. • The rotating priority aids the algorithm in assigning bandwidth equally and more fairly among requesting connections. • Its throughput is about 63%

  32. 2x2 switch with RRM algorithm under heavy load. • synchronization of output arbiters leads to a throughput of just 50%.

  33. Performance

  34. Synchronization

  35. a1 g1 4 1 4 1 3 2 3 2 g2 4 1 3 2 a3 4 1 3 2 g4 a4 4 1 4 1 3 2 3 2 • iSLIP – Iterative Serial-Line IP 2. Grant - If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output notifies each input whether or not its request was granted. The pointer gi to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input if and only if the grant is accepted in Step 3 of the first iteration.

  36. 4 1 4 1 g1 3 2 3 2 a1 g2 4 1 3 2 a3 4 1 3 2 g4 a4 4 1 4 1 3 2 3 2 • iSLIP – Iterative Serial-Line IP 2. Grant - If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output notifies each input whether or not its request was granted. The pointer gi to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input if and only if the grant is accepted in Step 3 of the first iteration.

  37. iSLIP properties • Property 1. Lowest priority is given to the most recently made connection. • If input i successfully connects to output j, both a i and g j are updated and the connection from input i to output j becomes the lowest priority connection in the next cell time. • Property 2. No connection is starved. This is because an input will continue to request an output until it is successful. The output will serve at most other inputs first, waiting at most N cell times to be accepted by each input. Therefore, a requesting input is always served in less than N 2 cell times. • Property 3. Under heavy load, all queues with a common output have the same throughput. This is a consequence of Property 2: the output pointer moves to each requesting input in a fixed order, thus pr-viding each with the same throughput.

  38. iSLIP properties • Simple to implement in hardware • Starvation free • Its throughput is about 100% • It is fair • As the load increases, the number of synchronized arbiters decreases (see Figure), leading to a large sized match. • Under uniform 100% offered load the iSLIP arbiters adapt to a time-division multiplexing scheme. • It converge in O(1)

  39. Bursty Arrivals

  40. Burstiness Reduction • Results indicate that iSLIP reduces the average burst length, and will tend to be more burst-reducing as the offered load increases. • This is because the probability of switching between multiple connections increases as the utilization increases. • As the load increases, the contention increases and bursts are interleaved at the output. In fact, if the offered load exceeds approximately 70%, the average burst length drops to exactly one cell.

  41. Burstiness Reduction

  42. Multiple Iteration • The pointer gi to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input if and only if the grant is accepted in Step 3 of the first iteration. • Note that pointers g i and a i are only updated for matches found in the first iteration. • It converge in O(logN)

  43. Multiple Iteration

  44. All with 4 iterations

  45. Implementation

  46. Implementation(2N arbiters)

  47. Implementation(N arbiters)Each arbiter is used for both inputand output arbitration. In this case, each arbiter contains two registers to hold pointers giand ai.

  48. Implementation

  49. Priority in iSLIP

  50. Why iSLIP is good for high speed? • input buffers are separated • Separated scheduler for each input and output • Each work independently

More Related