170 likes | 463 Views
Infiniband subnet management. Discuss the Infiniband subnet management system Discuss fat tree and subnet management in an Infiniband with a fat tree topology. References
E N D
Infiniband subnet management • Discuss the Infiniband subnet management system • Discuss fat tree and subnet management in an Infiniband with a fat tree topology. • References • A. Bermudez, R. Casado, F.J. Quiles, T. M. Pinkston, J. Duato, “Evaluation of a Subnet Management Mechanism for Infiniband Networks”, ICPP 2003. • A. Vishnu, A. R. Mamidala, H. Jin, D. K. Panda, “Performance Modeling of Subnet Management on Fat Tree Infiniband Networks using OpenSM”, Workshop on System Management Tools on Large Scale Parallel Systems, Held in Conjunction with IPDPS 2005 • X. Lin, Y. Chung, T. Huang, “A Multiple LID Routing Scheme for Fat-Tree-Based Infiniband Networks.” IPDPS 2004.
Infiniband devices and entities related to subnet management • Devices: Channel Adapters (CA), Host Channel Adapters, switches, routers • Subnet manager (SM): discovering, configuring, activating and managing the subnet • A subnet management agent (SMA) in every device generates, responses to control packets (subnet management packets (SMPs)), and configure local components for subnet management • SM exchange control packets with SMA with subnet management interface (SMI).
Subnet management packets (SMP) • 256 bytes of data • Use unreliable datagram service on the management virtual lane (VL 15) • LID routed: use lookup table for forwarding • Use after the subnet is setup. E.g. Check the status of an active port • Direct routed: has the information of the output port for each intermediate hop. • Subnet discovery for the subnet is setup
Subnet management packets (SMP) • Define the operation to be performed by SM • Get: get the information about CA, switch, port • Set: set the attribute of a port (e.g. LID) • GetResp: get response • Trap: inform SM about the state of a local node • A SMA stop sending Trap message until it receives TrapRepress packet. • Topology information can be obtained by a sweep and by peridical Traps.
Subnet Management phases: • Topology discovery: sending direct routed SMP to evert port and processing the responses. • Path computation: computing valid paths between each pair of end node • Path distribution phase: configuring the forwarding table
Subnet discovery • SM starts by sending a direct routed Get SMP to its local node. Upone receiving response, SM sends SMPs with additive depth.
Path computation: • Compute paths between all pair of nodes • For irregular topology: • Up/Down routing does not work directly • Need information about the incoming interface and the destination and Infiniband only uses destination • Potential solution: • find all possible paths • remove all possible down link following up links in each node • find one output port for each destination • Why is that still working? No clear to me. • Other solutions: destination renaming • Fat tree topology: • What is the best that can be achieved is also not clear.
Path distribution: • Ordering issue: the network may be in an inconsistent state when partially updated, which may result in deadlock during this period. • Traditional solution, no data packets for a period of time • deadlock free reconfiguration schemes.
Fat Tree: • A way to build large scale clusters
Fat Tree: • Routing in a complete fat tree: a up phase and a down phase (always contention free).
Fat Tree: • The complete fat tree has the scalability problem • The root has a very large nodal degree • How to build a fat tree with nodes that have a constant nodal degree. • M-port n-tree FT(m, n) • m is the number of port per switch • n+1 is the height of the tree • The tree consists of 2*(m/2)^n processing nodes and (2n-1)*(m/2)^(n-1) switches.
How is an m-port n-tree FT(m, n) connected? • m is the number of port per switch • n+1 is the height of the tree • The tree consists of 2*(m/2)^n processing nodes and (2n-1)*(m/2)^(n-1) switches. • A processing node is labeled as P(p_0 … p_{n-1}), • P_0 = 0..m-1, p_i (i!=0) = 0..m/2-1 • A switch is labeled as SW<w_0…w_{n-2}, l> • l = 0..n-1, • When l=0, w_i = 0..m/2-1 • When l!=0, w_0 = 0..m-1, w_I (I!=0) = 0..m/2-1
How is the tree connected? • SW<w, l>_k be the kth port of SW(w, l). • SW(w, l)_k and SW(w’, l’)_k’ is connected iff • l’ = l + 1 • w_0…w_{n-3} = w_0’…w_{l-1}’w_{l+1}’…w_{n-2}’ • k=w_l’, k’ = w_{n-2} + m/2 • Question: which switches are connected to SW<001, l>? How the ports are connected?
Fat tree properties: • Multiple routes between two nodes • Deterministic routing: one path between two nodes, how to map? • What is a good mapping? • In the case when the traffic pattern is unknown, common practice is to minimize the maximum load on a link. • Do we know how to do it? • Not clear even when there is no restriction on the routing. It is likely that an optimal solution exists for a particular FAT tree topology • In infiniband, destination based routing put some restriction on which path can be used.