170 likes | 366 Views
A Deterministic Single-path Routing Scheme for 2-Level Generalized Fat-trees. Wickus Nienaber, Santosh Mahapatra, Xin Yuan Department of Computer Science, Florida State University. Motivation. Fat-tree topologies are widely deployed in HPC environments.
E N D
A Deterministic Single-path Routing Scheme for 2-Level Generalized Fat-trees • Wickus Nienaber, Santosh Mahapatra, Xin Yuan • Department of Computer Science, • Florida State University
Motivation • Fat-tree topologies are widely deployed in HPC environments. • Deterministic single path routing is often used, in particular in InfiniBand networks. • In this work, we propose a new deterministic single path routing scheme for 2-level generalized fat-trees that optimizes for permutation communications.
Generalized 2-level fat-tree topology • 2-level fat-trees can be characterized by three parameters • n: number of machines connected to each switch • m: number of upper level switches • r: number of lower level switches
Generalized 2-level fat-tree topology • Cross-bisection bandwidth (CBB) ratio • Full bisection bandwidth fat-trees • Slimmed fat-trees • Fatted fat-trees
Routing issues • How to route traffic to maximize performance for permutation patterns? • Permutation pattern: each input port can connect to an arbitrary output port. Each port can used once in the pattern. • Non-blocking interconnects provide non-blocking communication for any permutation pattern. • Single path deterministic routing: one path for each source destination pair. • Fat-trees with deterministic routing are not nonblocking although the topology may not nonblocking. 0 1 2 0 1 2 3 4 5 6 7 8
Performance metrics • Worst-case permutation load (WORSTR): worst case maximum link load across all permutation patterns • Average case permutation load: average of maximum link load for all permutation patterns • Corresponding to Torsten’s effective bisection bandwidth • We develop a single path routing scheme that achieves the optimal worst-case permutation load. • This routing scheme also provides high average permutation performance in comparison to existing routing schemes.
The lower bound of the worst-case permutation load The worst-case permutation load of any single path routing scheme for any T(n+m, r) is at most n.
Worst-case permutation load for existing routing schemes • Destination-mod-k (D-mod-k) and source-mod-k routing • In D-mod-k, traffic for (s, d) is routed through top level switch d mod m. • For many fat-trees, the worst-case permutation load is n. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 In T(4+4, 8): (0, 4), (1, 8), (2, 12), (3, 16): all go through switch 0.
A routing scheme with optimal worst-case permutation load • Basic idea • If a link carries traffic from X sources (or to X destinations), its load would be not more than X for any permutation pattern. • Each link carries traffics either from at most sources or to at most destinations, the maximum link load can be at most . • With destination-mod-k, some of the links carry traffics from n sources to more than n destinations. (0, 100), (0, 200), (0, 300), … (1, 101), (1, 201), (1, 301), …
A routing scheme with optimal worst-case permutation load • Algorithm OPT: • Partition the n ports in each lower level switch into groups. • Each group has members. • Comm. from group i to group j go through top switch . • OPT has the worst case permutation load of . 0 1 2 3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 (0, 4), (1, 8) go through switch 0, (2, 12), (3, 16) go through switch 2. Group 0 Group 1
A routing scheme with optimal worst-case permutation load • What if is not an integer? • The algorithm will only use top level switches (e.g m = 8, only 4 top level switches will be used). • We use a heuristic to re-balance SD pairs to unloaded links to improve the average case performance.
Average case performance • Three types of permutations: • Bisection patterns • Full permutation patterns • Dissemination patterns (Bruck’s all-to-all pattern) • Use the average of a large number of random patterns to approximate the average case performance. • Starting from 1000 random patterns • Double the number of random samples until the 99% confidence interval is no more than 1% of the average.
Conclusion • Existing single path routing schemes are not ideal for permutation communications in 2-level fat-trees. • Our proposed routing scheme achieves optimal worst-case permutation performance. • Our scheme also achieves high average performance for permutation communications.