330 likes | 470 Views
Belief-Propagation Assisted Scheduling in Input-Queued Switches. S. Atalla 1 , D. Cuda 2 , P. Giaccone 1 , M. Pretti 2 1 Politecnico di Torino 2 Italian National Research Council. Hot Interconnects 2010 August 2010. Outline. Background motivations System model
E N D
Belief-Propagation Assisted Scheduling in Input-Queued Switches S. Atalla1, D. Cuda2, P. Giaccone1, M. Pretti2 1Politecnico di Torino 2Italian National Research Council Hot Interconnects 2010August 2010
Outline • Background motivations • System model • Basic belief-propagation algorithm for MWM • Assisted scheduling • Belief-propagation for assisted scheduling • Performance evaluation • Hardware implementation • Conclusions Hot Interconnects 2010
Background motivations • Internet traffic is steadily increasing • Routers and switches require to process growing amount of data faster and faster • Input Queued (IQ) switches can be considered as a reference architecture • Memory speed = line rate • IQ switches require suitable scheduling algorithms that • Ensure good performance (throughput, delay,) • Run fast (few ns to take each scheduling decision) • Are implementable in hardware (HW) Hot Interconnects 2010
System model • NxN crossbar with Virtual Output Queuing • one FIFO queue for each input output pair • total of N2 queues • Synchronous architecture: • time is slotted • fixed sized packets Hot Interconnects 2010
Scheduling algorithm • At each timeslot, the scheduler selects a set of head-of-line packets compatible with the crossbar constraint: • At the most one packet can be transferred to/from each output/input port • equivalent to choose a matching in a bipartite graph • Inputs: lengths of the VOQ • Outputs: matching described through binary variable: xij=1 iff input i transfer packet to output j x00=1 qij Scheduler(MWM, iSLIP, iLQF, …) 0 0 0 0 x33=0 3 3 3 3 Hot Interconnects 2010
Scheduling algorithm dichotomy • Maximum Weight Matching (MWM) is • Optimal in terms of performance • Difficult to implement in HW • O(N3) operations, difficult to be parallelized • Heuristic algorithms mimicking MWM • E.g., iSLIP, iLQF, WFA (and many others) • Efficient to be implemented in HW • e.g., iSLIP was implemented in CISCO 12000 serie • Possible traffic losses under critical traffic patterns Hot Interconnects 2010
Basic belief-propagation for MWM • Recently, Belief-Propagation (BP) algorithm has been proposed to solve MWM problem [1,2] • BP algorithms are message passing algorithms firstly conceived to study Graphical Models (GMs) • GMs combine graphic theory and probability theory • BP is exact for MWM over bipartite graph (see [1]), but • To ensure convergence, MWM must be unique • Small random noise can be added to queue length • It takes O(N3/ε) to converge • ε: difference in weight between the first two heaviest matchings • not known a priori [1]M. Bayati, D. Shah, and M. Sharma, “Max-product for maximum weight matching: Convergence, correctness, and LP duality,” Information Theory, IEEE Transactions on, vol. 54, no. 3, pp. 1241–1251, Mar. 2008. [2]M. Bayati, B. Prabhakar, D. Shah, and M. Sharma, “Iterative scheduling algorithms,” in INFOCOM 2007, IEEE, 6-12 2007, pp. 445 –453. Hot Interconnects 2010
Basic belief-propagation for MWM 0 0 3 3 Hot Interconnects 2010
Basic belief-propagation for MWM 0 0 3 3 Hot Interconnects 2010
Basic belief-propagation for MWM 0 0 3 3 Hot Interconnects 2010
Basic belief-propagation for MWM 0 0 3 3 Hot Interconnects 2010
Basic belief-propagation for MWM 0 After convergence, each output it is matched to the input associated with the largest message. 0 Hot Interconnects 2010
Assisted scheduling • Our major contribution is the introduction of the concept of assisted scheduling: • Instead of the queue length, scheduling algorithms are modified to use messages computed by BP as weights • We show that BP assisted scheduling boosts performance of existing schedulers while keeping backward compatibility Hot Interconnects 2010
Assisted scheduling • We introduce the Belief-Propagation Message-Processing module between the VOQs and the Scheduler • BP-MP computes message values as a function of the queue length Q(t),based on a BP algorithm • The scheduler works in the usual way, but scheduling decisions are based on the messages F(t) computed by the BP-MP module instead that on Q(t) • F(t) can be see as a correction of the VOQ lengths Q(t) BP-MPfew I Scheduler Hot Interconnects 2010
Assisted scheduling • BP propagation has been improved with: • Relaxation of the MWM uniqueness constraint • We do not need BP to converge anymore • No random noise • Finite (and small) number of iterations • Integer number representation • Memory • Self-Asynchronous update Hot Interconnects 2010
Messages for assisted scheduling • It runs for a fixed (and small) number of iterations I • Messages are bounded • Messages • represented through integer numbers • Same numerical range of the queue length(aroundlog2 Qmax bits) Hot Interconnects 2010
Memory for assisted scheduling • Queues exhibit a strong correlation that is reflected in the message dynamics • Queue length can change at the most by 1 at each timeslot Memory: messages are initialized to the last computed messages Memory speeds up convergence Hot Interconnects 2010
Self-asynchronous update for assisted scheduling • Studies in BP showed that messages updated in a random sequential order are beneficial for the convergence (asynchronous update) • Not easy to implement in HW • Self-asynchronous update: • exploits randomness of the arrival process • updates only messages associated with queues which have changed from the previous timeslot • mimics asynchronous update Hot Interconnects 2010
Scheduling algorithms • iLQF vs. BP assisted iLQF (BP-iLQF) • Distributed greedy algorithm • Each input (each output) is equipped with an arbiter which selects output (input) associated with the longest queue • Greedy MWM (GMWM) vs. BP assisted GMWM (BP-GMWM) • centralized scheduling, iterating N times • at each iteration it selects the unmatched input/output couple associated with the longest queue • iSLIP • as iLQF, but sending only a binary information (queue empty/not-empty) Hot Interconnects 2010
Performance evaluation settings • Simulation settings: • Traffic patterns: Critical traffic pattern Hot Interconnects 2010
Performance evaluation results BP assisted scheduling improves performance (I=3) Self-asynchronous Asynchronous Memory Synchronous No Memory Hot Interconnects 2010
Hardware design: General overview BP-MP Forward messages Backward messages VOQ • When n=I, IM sends F(t) to the scheduler • IM and OM perform the same operations Scheduler 2N modules running in parallel Hot Interconnects 2010
Hardware design: IM details Flags associated with VOQ at input i N registers of size log2 Qmax Tournament implementationlog2 (N-1) stages and (N-2) comparisons Subtraction operation When n=I messages are sent to the scheduler Self-asynchronous: if wij(t)≠ wij(t-1) eij=1 elseeij=0 c used to select between 0 and the result of the subtraction operation Memory: registers storingmessages computed during the previous timeslot Max operation Hot Interconnects 2010
Conclusion • We proposed BP assisted scheduling to boost performance of existing scheduling algorithms keeping backward compatibility • BP runs for few iterations • We simplified and improved basic BP algorithm: • Relaxation of MWM uniqueness constraint • Integer messages (backward compatibility) • Message memory • Self-asynchronous update • We provided a high-level description of a possible HW implementation of the BP-MP: • BP-MP can be efficiently implemented in HW and it is compatible with existing implementations Hot Interconnects 2010
Belief-Propagation Assisted Scheduling in Input-Queued Switches S. Atalla1, D. Cuda2, P. Giaccone1, M. Pretti2 1Politecnico di Torino 2Italian National Research Council Thank you for your attention! Any questions? Hot Interconnects 2010August 2010 Hot Interconnects 2010
Example: MWM computation over a tree • Node “1” must decide to add or not edge (1,2) to the matching • Node “1” takes its decision based on the information provided only by nodes belonging to its neighborhood • E.g., Node “2” sends to “1” two messages: • : MWM of the sub-tree rooted at “2” comprising (2,1) given that (2,1) is part of the MWM rooted at “1” • : MWM of the sub-tree rooted at 2 comprehending (2,1)given that (2,1) is part of the MWM rooted at “1” 1 Take or not to take (2,1)? 2 7 3 5 6 4 w21 w71 w61 w32 w42 w42 Hot Interconnects 2010
Example: MWM computation over a tree Message definitions: • If(2,1) is part of the MWM, then(3,2), (4,2), (5,2) can not be in the MWM if (2,1) is not the MWM, then at the most one (or none) among (3,2), (4,2), (5,2) can part of the MWM • It is possible to reduce the number of exchanged messages combining into a single message 1 1 2 2 3 3 5 5 4 4 w21 Hot Interconnects 2010
Example: MWM computation over a tree Node “1” decision: • Node “1” adds edge (1,2) to the MWM if: or equivalently 1 Take or not to take (2,1)? 2 7 3 5 6 4 w21 w71 w61 w32 w42 w42 Hot Interconnects 2010
Graphical models • BP algorithms are message passing algorithms conceived firstly to study Graphical Models (GMs) • GMs are a “marriage” between probability theory and graph theory lo direi solo a voce, non significa niente qui • GMs are becoming a powerful tool in several fields of science (AI, speech recognition, coding/decoding, bioinformatics) to compute marginal probabilities and maximum a posteriori probability (max-product algorithm) • “BP” and “max-product “ are usually simply referred as “BP” since computing the maximum a posteriori probability requires first to compute the marginal distributions io questa frase non l’ho capita e mi pare rischiosissima!!! Hot Interconnects 2010
VOQ BP-MP Scheduler Hot Interconnects 2010
Scheduler: iLQF • If the MWM is unique, BP assisted iLQF, running with weights computes exactly the MWM Hot Interconnects 2010
Performance evaluation: results BP assisted scheduling improves performance (I=3) Average delays : delays BP-iLQF/GWM are at the most 1.37 times delays of iLQF/GWM. Self-asynchronous Asynchronous Memory Synchronous No Memory Hot Interconnects 2010
Basic belief-propagation for MWM 0 0 3 3 Hot Interconnects 2010