300 likes | 435 Views
On Scalable Storage Area Network(SAN) Fabric Design Algorithm. Bong-Jun Ko (Columbia University) Kang-Won Lee (IBM T. J. Watson Research) Seraphin Calo (IBM T. J. Watson Research). Motivation. SAN is becoming a popular solution as data amount grows fast in enterprise computing environment.
E N D
On Scalable Storage Area Network(SAN) Fabric Design Algorithm Bong-Jun Ko (Columbia University) Kang-Won Lee (IBM T. J. Watson Research) Seraphin Calo (IBM T. J. Watson Research)
Motivation • SAN is becoming a popular solution as data amount grows fast in enterprise computing environment. • Replaces physical connection between hosts and storages with high-bandwidth Fibre Channel switching network. • Enables data/resource sharing across multiple hosts. • Increases reliability and resiliency of storage system. • A scalable SAN design solution is needed. • SAN design is currently done manually by human. • Large-scale SAN may consist of hundreds of servers and devices. • Finding a low-cost solution is challenging.
Background • Components of SAN • Servers • Storage devices • SAN Fabric • Arbitrated loop • Switch fabric • SAN system design procedure • Application requirement analysis (e.g., required storage, I/O rates) • Physical constraints analysis (e.g., geographic location) • Server/storage planning (e.g., port assignment, inter-operability) • SAN fabric design • Zone planning and output generation
SAN Fabric Design • Design consideration • Fabric cost • Resilience upon node or link failure • Future growth requirement and scalability • Ease of maintenance for human administrator • SAN fabric configuration : Mesh-based vs Core-edge-based
General SAN fabric design problem • Input : • A set of host ports, {i}, and set of device ports, {j}. • A set of flows, F={fij}, fij = bandwidth requirement from host port i to device port j. • A set of switch types (# of ports, cost) that can be used • Output : • A set of switches S and a set of links L that interconnect host, device, switch ports. • Constraints : • Only given types of switches are used. • For each flow, there exists some path from host port to device port. • The aggregate bandwidth of flows does not exceed the link bandwidth. • Optimization goal : minimizing the cost of SAN fabric (switches + links)
General SAN fabric design problem • Input : • A set of host ports, {i}, and set of device ports, {j}. • A set of flows, F={fij}, fij = bandwidth requirement from host port i to device port j. • A set of switch types (# of ports, cost) that can be used • Output : • A set of switches S and a set of links L that interconnect host, device, switch ports. • Constraints : • Only given types of switches are used. • For each flow, there exists some path from host port to device port. • The aggregate bandwidth of flows in each link does not exceed the link bandwidth. • Optimization goal : minimizing the cost of SAN fabric (switches + links)
Core-edge SAN fabric design problem • Additional constraints : • Only a specific type of switches are used for each level (# of hops from core switch). • Flows are merged at host-side edge switches, and split at device-side edge switches. • The number of edge level is bounded. • Optimization goal : minimizing the cost of SAN fabric switches. level 1(host side) level 0(core) level 1(device side)
Challenges f1=0.4, f2=0.3, f3=0.2, f4=…=f14=0.1 • Fundamental constraints in assigning flows to switches • Bandwidth limit of a link (or a port) • Number of ports in a switch • Numerous ways to assign flows in multiple levels • Q : Which one costs less? f1 f2 f3 f4f5 f6 f7 …… f13 f14 8 8 f1 f4 …… f9 f2 f3 f10 …… f14 8 8
Challenges f1 = … = f20= 0.05 f1 …… f7 • Fundamental constraints in assigning flows to switches • Bandwidth limit of a link (or a port) • Number of ports in a switch • Numerous ways to assign flows in multiple levels • Q : Which one costs less? f13 …… f20 8 16 f1 …… f7 f8 …… f14 f15 …… f20 8 8 8 16
Our Approach • Multi-stage, multi-level bin packing • Decompose the problem space • Core-switch level minimization • Goal : minimize the number of ports required in core level • Pack flows into logical flow groups based on bandwidth. • Edge-switch level minimization • Goal : minimize the total cost of edge switch fabric • Pack flow groups into physical switches in each level based on number of ports. • Effectively decouple the BW and # of ports constraints.
Bandwidth Packing f0=0.7 > f1=0.5 > f2=0.2 … > fn=0.01
f0 0.7 Bandwidth Packing f1=0.5 > f2=0.2 … > fn=0.01
f0 f1 0.7 0.5 Bandwidth Packing f2=0.2 … > fn=0.01
Bandwidth Packing … > fn=0.01 f2 f0 f1 0.9 0.5
Bandwidth Packing • Result: • The aggregate BW of any flow group does not exceed the link BW. • No two flow groups can be merged together. • A group of k flows occupies k input ports and 1 output ports. • The number of flow groups generated is the number of ports required in core switch. f2 f0 f1 b1 b2 bm
s1 s2 sm 16 16 16 Mapping Flow Groups into Physical Switches 16 13 10 7 6 4 3 3
s1 s2 sm 16 16 16 16 Mapping Flow Groups into Physical Switches 7 6 4 3 3 16 13 10
s1 s2 sm 16 16 16 16 16 Mapping Flow Groups into Physical Switches 6 4 3 3 16 13 10 7
s1 s2 sm 16 16 16 16 16 Mapping Flow Groups into Physical Switches 4 3 3 16 13 10 6 7
s1 s2 sm 16 16 16 16 16 Mapping Flow Groups into Physical Switches 16 13 3 10 6 7 4 3
20 13 6 7 7 7 s1 s2 sm 21 8 8 8 7 16 16 16 8 8 8 8 8 15 4 Mapping Flow Groups into Physical Switches • Higher allocation less lower-level switches • Lower allocation less higher-level switches • Q : Which one is better?
Go High or Low? • The cost of switches increases faster than linear function of number of ports. e.g., List price (as of Aug 2004) • IBM 3534(8 ports) : $5,136 • IBM 2106(16ports) : $15,511 • “Bottom-Up” approach • Start with lowest possible assignment. • Re-assign flows to higher-level switches. • Pack flow groups in lower-level based on reduced port counts. • Merge lower-level switches whenever it saves cost. • Repeat merging recursively along the switch hierarchy.
14 14 16 20 6 7 7 7 2 6 7 2 17 21 8 8 8 8 8 8 8 8 8 8 7 8 8 3 7 3 16 16 4 4 16 16 16 16 16 8 8 7 8 8 3 4 4 Reducing Edge Switch Cost
6 4 14 14 16 20 7 6 7 2 6 7 2 7 7 7 7 5 21 17 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 6 7 3 16 16 9 7 16 16 16 16 16 8 8 3 4 Reducing Edge Switch Cost 8 8 7 4 Replaced one 16-p SW with two 8-p SW cost reduced!
Future Work • Performance analysis • Compare with other approach, e.g., IP solver • Derive analytical bound • Quantify adaptability to future growth • Open question : How much different are two trees? • Incorporate into IBM SAN design tool