100 likes | 248 Views
SDSC's Data Oasis. Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13 ) Rick Wagner San Diego Supercomputer Center Jeff Johnson Aeon Computing April 18, 2013. Data Oasis. High performance, high capacity Lustre-based parallel file system
E N D
SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner San Diego Supercomputer Center Jeff Johnson Aeon Computing April 18, 2013
Data Oasis • High performance, high capacity Lustre-based parallel file system • 10GbE I/O backbone for all of SDSC’s HPC systems, supporting multiple architectures • Integrated by Aeon Computing using their EclipseSL • Scalable, open platform design • Driven by 100GB/s bandwidth target for Gordon • Motivated by $/TB and $/GB/s • $1.5M = 4@MDS + 64@OSS = 4PB = 100GB/s • 6.4PB capacity and growing • Currently Lustre 1.8.7
Data Oasis Heterogeneous Architecture TRESTLES IB cluster GORDON IB cluster TRITON 10G & IB cluster 3 Distinct Network Architectures Mellanox 5020 Bridge 12 GB/s 64 Lustre LNET Routers 100 GB/s Juniper 10G Switches XX GB/s MDS: Trestles scratch Redundant Switches for Reliability and Performance Arista 7508 10G Arista 7508 10G MDS: Gordon scratch MDS: Gordon & Trestles project MDS: Triton scratch 64 OSS (Object Storage Servers) Provide 100GB/s Performance and >4PB Raw Capacity OSS 72TB OSS 72TB OSS 108TB OSS 72TB Metadata Servers JBOD 132TB JBODs (Just a Bunch Of Disks) Provide Capacity Scale-out
Data Oasis Servers Myri10GbE Myri10GbE Myri10GbE MDS (active) OSS OSS+JBOD LSI LSI LSI RAID 10 (2x6) RAID 6 (7+2) RAID 6 (8+2) LSI x4 … x4 … RAID 6 (7+2) MDS (backup) RAID 6 (8+2) Myri10GbE 2TB drives 3TB drives
GbE Trestles Architecture XSEDE & R&E Networks • QDR IB • GbE management • GbE public • Round robin login • Mirrored NFS • Redundant front-end SDSC Network Data Movers (4x) Login Nodes (2x) 4x NFS Servers (4x) Gordon & Trestles Shared Gordon Cluster QDR InfiniBand Switch Arista (2x MLAG) 7508 10 GbE switch IB/Ethernet Bridge switch 12x Data Oasis Lustre PFS 4 PB Compute Node Compute Node Compute Node Compute Node 324 Mgmt. Nodes (2x) Ethernet Management QDR 40 Gb/s 10GbE
GbE Gordon Network Architecture XSEDE & R&E Networks • Dual-rail IB • Dual 10GbE storage • GbE management • GbE public • Round robin login • Mirrored NFS • Redundant front-end SDSC Network Data Movers (4x) Mgmt. Nodes (2x) Public Edge & Core Ethernet Mgmt. Edge & Core Ethernet 4x Login Nodes (4x) IO Nodes Compute Node Compute Node Compute Node Compute Node 1,024 Data Oasis Lustre PFS 4 PB Arista 10 GbE switch NFS Server (4x) 128x IO Nodes 3D torus: rail 1 3D torus: rail 2 64 2x10GbE QDR 40 Gb/s 10GbE
Issues & The Future • LNET “death spiral” • LNET tcp peers stop communicating, packets back up • We need to upgrade to Lustre 2.x soon • Can’t wait for MDS SMP improvements & DNE • Design drawback: juggling data is a pain • Client virtualization testing • SR-IOV very promising for o2ib clients • Watching the Fast Forward program • Gordon’s architecture ideally suited to burst buffers • HSM • Really want to tie Data Oasis to SDSC Cloud