1 / 93

Packet Scheduling/Arbitration in Virtual Output Queues: Maximal Matching Algorithms (Part II)

Packet Scheduling/Arbitration in Virtual Output Queues: Maximal Matching Algorithms (Part II). Pointer Desynchronization. Performance: RRM < iSlip < FIRM Difference only in updating pointers Observation: iSlip and FIRM can effectively desynchronize their output pointers

shiro
Download Presentation

Packet Scheduling/Arbitration in Virtual Output Queues: Maximal Matching Algorithms (Part II)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Packet Scheduling/Arbitration in Virtual Output Queues: Maximal Matching Algorithms(Part II)

  2. Pointer Desynchronization • Performance: RRM < iSlip < FIRM • Difference only in updating pointers • Observation: iSlip and FIRM can effectively desynchronize their output pointers • The best effect of pointer desynchronization is achieved if forced

  3. Static Round Robin Matching (SRR):To Achieve FULL Desynchronization • Initialization. The input pointers are set to 0's. The output pointers are set to some initial pattern such that there is no duplication among the pointers. • The 3 steps of one iteration are: • Request. Each input sends a request to every output for which it has a queued cell. • Grant. If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output notifies each input whether or not its request was granted.The pointer to the highest priority element of the round-robin schedule is always incremented by one (modulo N) whether there is a grant or not.

  4. SRR (Cont’d) • Accept. If an input receives a grant, it accepts the one that appears next in a fixed round-robin schedule starting from the highest priority element. The pointer to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the accepted one. • In DSRR (Improved version of SRR), input pointers are also desynchronized. • Rotating DSRR (RDSRR): • Unfairness among inputs under special traffic model. • Outputs searching in clockwise and anti-clockwise directions alternatively to decide grants.

  5. 32x32 switch under uniform traffic 70 iSlip 60 FIRM SRR DSRR 50 RDSRR 40 Relative average delay 30 20 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized load Simulation Results

  6. 32x32 switch under uniform bursty traffic 45 40 iSlip FIRM SRR 35 DSRR RDSRR 30 Relative average delay 25 20 15 10 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized load Simulation Results

  7. 32x32 switch under hotspot traffic 4 10 iSlip FIRM SRR 3 10 DSRR RDSRR Relative average delay 2 10 1 10 0 10 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 Normalized load Simulation Results

  8. 32x32 switch under unbalanced traffic 4 10 iSlip FIRM SRR 3 10 DSRR RDSRR Average delay 2 10 1 10 0 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized load Simulation Results

  9. Stability Property • A VOQ switch is considered stable if it approaches a steady state where the expected length of each VOQ is bounded. If it is stable, 100% throughput can be achieved under any admissible traffic pattern. • RDSRR is more stable than iSlip and FIRM under various traffic patterns.

  10. 32x32 switch under unbalanced traffic 1.01 1 0.99 0.98 Throughput 0.97 iSlip FIRM 0.96 RDSRR Output 0.95 0.94 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 Normalized load Stability Property (Cont’d)

  11. 3-Phase & 2-Phase Algorithms • iSlip & FIRM are 3-phase algorithms: Request-Grant-Accept • DRRM is 2-phase algorithm: Grant-Accept • Each input sends one grant • Each output sends one accept • 2-FIRM is the 2-phase version of FIRM

  12. DRRM (Dual Round Robin Matching)

  13. 32x32 switch under uniform traffic 70 iSlip 60 DRRM FIRM 2-FIRM 50 40 Relative average delay 30 20 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized load 3-Phase & 2-Phase Algorithms

  14. 32x32 switch under hotspot traffic 4 10 iSlip DRRM 3 10 FIRM 2-FIRM 2 10 Relative average delay 1 10 0 10 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 Normalized load 3-Phase & 2-Phase Algorithms

  15. 3-Phase & 2-Phase Algorithms • In general case, the traffic model changes from time to time • When the temporary non-uniformity is on the input side, 3-phase scheme performs better • When the temporary non-uniformity is on the output side, 2-phase scheme performs better

  16. 2-stage Maximum Size Matching Algorithm: Description • The 2-stage algorithm works in the following way: 1. The pointers at both input and output sides are kept fully desynchronized. 2. In each iteration, there are 3 steps: Step 1:Each input sends a request to every output for which it has a queued cell. Step 2:Each input selects one VOQ to send grant that appears next starting from its highest priority output. Each output selects one request received in step 1 to send grant that appears next starting from its highest priority input. OutputCount = number of outputs receiving grants from inputs. InputCount = number of inputs receiving grants from outputs.

  17. 2-stage Maximum Size Matching Algorithm: Description • Step 3: If OutputCount ? InputCount, each output selects one among the grants received in step 2 which appears next starting from its highest priority input and sends accept. • Else, each input selects one among the grants received in step 2 which appears next starting from its highest priority output and sends accept. • In simple words, this algorithm will decide in each time slot whether to use 2-phase or 3-phase scheme based on which one can make more matches.

  18. 1st group of inputs 2nd group of inputs 2 physical lines from comparator Comparator Output Counter Input Counter State of Input Queues (N2 bits) Decision Register 1 1 2 2 N N Grant Arbiters Accept Arbiters 2-stage Maximum Size Matching Algorithm: Hardware Implementation

  19. Performance Evaluation: Simulation Study Uniform Traffic

  20. Performance Evaluation: Simulation Study 2-stage over iSlip SRR over iSlip

  21. Performance Evaluation: Simulation Study Bursty Traffic

  22. Performance Evaluation: Simulation Study 2-stage over iSlip SRR over iSlip

  23. Performance Evaluation: Simulation Study Hotspot Traffic

  24. Performance Evaluation: Simulation Study 2-stage over iSlip SRR over iSlip

  25. Performance Evaluation: Simulation Study Unbalanced Traffic

  26. Performance Evaluation: Simulation Study 2-stage over iSlip SRR over iSlip

  27. A new algorithm – RDESRR • Real Desynchronized Round Robin Model (RDESRR) • Based on 2 phases RRM model (Request and Grant) • Add a small share memory that each outputs can read/write (called Share Bits) • The size of the memory is 1 bit per input • If the bit is set, the corresponding input has already granted by an output • If the bit is not set, the output may grant to corresponding input port

  28. 0 1 2 3 RDESRR Conceptual model Share Bits 3 0 2 1 0 0 3 0 2 1 1 1 3 0 2 1 2 2 3 0 2 1 3 3

  29. RDESRR model • 2 phases only • Request. Each input sends a request to every output for which it has a queued cell. • Grant. If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output check the corresponding bit is set or not, if not set, the output will set the bit and notifies the input its request was granted. Otherwise, the output will look for next request until all requests has gone through. The pointer gi to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input. If no request is received, the pointer stays unchanged.

  30. RDESRR Demo - Request Step 1: Request 0 0 1 1 2 2 3 3

  31. Share Bits 0 3 0 2 1 1 3 0 2 1 2 3 0 2 1 3 0 2 1 3 RDESRR Demo – Add a share memory in Output • Add a small share memory that each outputs can read/write (called Share Bits) Step 2: Grant 0 0 1 1 2 2 3 3

  32. 0 3 0 2 1 1 3 0 2 1 2 3 0 2 1 3 0 2 1 3 RDESRR Demo – Output check the share bits • The output check the corresponding bit is set or not Step 2: Grant Share Bits 0 0 1 1 2 2 3 3

  33. 0  3 0 2 1 1 3 0 2 1  2 3 0 2 1  3 0 2 1 3 RDESRR Demo – When share bit is occupied • if not set, the output will set the bit and notifies the input its request was granted • The share bit is First Come First Serve Step 2: Grant Share Bits 0 0 1 1 2 2 3 3

  34. 0 3 0 2 1 1 3 0 2 1 2 3 0 2 1 3 0 2 1 3 RDESRR Demo – Output looks for next request • If set, the output will look for next request until all requests have gone through Step 2: Grant Share Bits  0 0  1 1 2 2  3 3

  35. 0  3 0 2 1 1 3 0 2 1  2 3 0 2 1  3 0 2 1 3  RDESRR Demo – All share bits are allocated • Fully allocate the share bit will result for fully grant all input request Step 2: Grant Share Bits 0 0 1 1 2 2 3 3

  36. 0 3 0 2 1 0 0 1 3 0 2 1 1 1 2 3 0 2 1 2 2 3 0 2 1 3 3 3 RDESRR Demo – Pointer update/Share bit reset • The pointer gi to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input • If no request is received, the pointer stays unchanged • Share bits are also reset Share Bits

  37. SIM Results • Run the test for 32x32 port in SIM using –l 1000000

  38. Input QueueingLongest Queue First orOldest Cell First { = } Queue Length Weight 100% Waiting Time 1 1 1 1 1 10 2 2 2 2 1 w e i g h t M m m a x i u 3 3 3 3 1 10 4 4 4 4 1

  39. Non-uniform traffic Uniform traffic Avg Occupancy Avg Occupancy VOQ # VOQ # Input QueueingWhy is serving long/old queues better than serving maximum number of queues? • When traffic is uniformly distributed, servicing themaximum number of queues leads to 100% throughput. • When traffic is non-uniform, some queues become longer than others. • A good algorithm keeps the queue lengths matched, and services a large number of queues.

  40. Maximum/Maximal Weight Matching • 100% throughput for admissible traffic (uniform or non-uniform) • Maximum Weight Matching • OCF (Oldest Cell First): w=cell waiting time • LQF (Longest Queue First):w=input queue occupancy • LPF (Longest Port First):w=QL of the source port + Sum of QL form the source port to the destination port • Maximal Weight Matching (practical algorithms) • iOCF • iLQF • iLPF (comparators in the critical path of iLQF are removed )

  41. Maximal Weight Matching Algorithms: iLQF • Request. Each unmatched input sends a request word of width bits to each output for which it has a queued cell, indicating the number of cells that it has queued to that output. • Grant. If an unmatched output receives any requests, it chooses the largest valued request. Ties are broken randomly. • Accept. If an unmatched input receives one or more grants, it accepts the one to which it made the largest valued request. Ties are broken randomly.

  42. Maximal Weight Matching Algotithms: iLQF • The i-LQF algorithm has the following properties: • Property 1. Independent of the number of iterations, the longest input queue is always served. • Property 2. As with i-SLIP, the algorithm converges in at most logN iterations. • Property 3. For an inadmissible offered load, an input queue may be starved.

  43. Maximal Weight Matching Algotithms: iOCF • The i-OCF algorithm works in similar fashion to iLQF, and has the following properties: • Property 1. Independent of the number of iterations, the cellthat has been waiting the longest time in the input queues (it must at the head of the queue) • Property 2. As with i-LQF, the algorithm converges in at most logN iterations. • Property 3. No input queue can be starved indefinitely. • Property 4. It is difficult to keep time stamps on the cells.

  44. iLQF - Implementation

  45. iLPF - Implementation Complicated hardware

  46. Other research efforts • Packet-based arbitration • Exhaustive-based arbitration • Numerous other efforts

  47. Packet Scheduling/Arbitration in Virtual Output Queues:Randomized Algorithmsand Others

  48. Scheduler Crossbar 1,1 inputs 1 i,j . . . . N N,N outputs 1 . . . . N Input-Queued Packet Switch (i i,j < 1 ; j i,j < 1) Xi,j

  49. 1 0 0 1 1 1 1 1 0 Bipartite Graph and Matrix inputs 1 2 3 outputs 1 2 3

  50. Stability of Scheduling Definition:Let Xi,j(t) be the number of packets queued at input i for output j at time-slot t. Then an algorithm is stable iff:

More Related