110 likes | 224 Views
FERMI SAN Effort. Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis. Overview. Motivation Current Problems Future goals Evaluation CXFS SANergy GFS Current Status. Motivation. Current Problems Unbalanced use of central UNIX cluster
E N D
FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis HEPiX/HEPNT
Overview • Motivation • Current Problems • Future goals • Evaluation • CXFS • SANergy • GFS • Current Status HEPiX/HEPNT
Motivation • Current Problems • Unbalanced use of central UNIX cluster • Large dataset(s) need to be shared in a large distributed compute environment • Current solutions lack performance throughput • Future goals • Linux analysis cluster with SMP feel HEPiX/HEPNT
Evaluation: CXFS • Currently SGI-only • Currently requires RAID • True(er) SAN FS • Commitment to Linux port Equipment • 1 Origin 2200 • 2 Origin 200s • 1 Brocade F-C switch • 1 SGI (Clarion) RAID ~1TB raw HEPiX/HEPNT
Evaluation: SANergy • Heterogeneous solution -- Solaris, WinNT, WIN2K, IRIX, Tru-64, MacOS, AIX • Works with RAID or JBOD • Pseudo SAN FS with NFS look • Linux port in future (11/00, both MDC and Client) Equipment • 1 Sun Sparc20: RAID management box • 1 Ultra 60, 3 Linux, 1 NT4: MDC and client • 1 O2200 (client only) • 1 16 port Brocade switch • 1 Metastor E4400 RAID ~720GB raw HEPiX/HEPNT
Evaluation: GFS GFS • Open source (GPL’d) • Sistina Software (ex-University of Minnesota) • High performance 64-bit files and file system • Distributed, server-less metadata • Data synchronization via global, disk based locks • Journaling and node cast-out • Three major pieces: • The network storage pool driver • The file system • The locking modules HEPiX/HEPNT
System integrator Linux NetworX Cluster control box Compute Nodes Linux NetworX Dual 600 MHz Pentium III ASUS motherboard 1 Gig RAM 2x36 Gig EIDE disks Qlogic 2100 HBA Ethernet Cisco Catalyst 2948G Fibre Channel Gadzoox Capellix 3000 Global Disk DotHill SanNet 4200 Dual Fibre Channel controllers 10x73 Gig Seagate Cheetah SCSI disk Software Linux 2.2.16 Qlogic drivers GFS V3.0 Condor Evaluation: GFS (equipment) HEPiX/HEPNT
Current Status: CXFS Config: 1 file system, 2-9 36GB disk RAID 5 LUNs striped together; Each system w/ 1 HBA • 3 writes and 3 reads simultaneous of 1GB files at 64K blocks (6 different files) • READ 11.5/11.6/11.9 MB/s • WRITE 36.5/28.4/28.4 MB/s • SGI Clarion RAID biases towards writes • Aggregate: 128 MB/s / 200 MB/s 64% utilization • Peak single write for 2GB file 64K blk = 45MB/s • Peak single read for 2GB file 64K blk = 28MB/s • Simultaneous writes to same file = 0.165455 MB/sec HEPiX/HEPNT
Current Status: CXFS • Stability Issues • Cluster can hang when unmounting file systems • Problem on one machine can affect all nodes resulting in need to reboot entire cluster • Simple reboot often does not work and will need to execute a hard reset. • Java GUI • Occasionally hangs • Occasionally reports erroneous cluster status HEPiX/HEPNT
Current Status: SANergy • Equipment almost in place • MetaStor hardware raid tested w/out SANergy • Pleased with performance • Worked as AFS file server central disk store • Used this hardware with CXFS test • Config: 2 - 9 disk RAID 5 • Results: 95+MB/s read ; 90+MB/s write • Limited by HBA • Software yet to be received HEPiX/HEPNT
Current Status: GFS Config: 5 machines, 1 5-disk RAID-5 • 2 reads and 1 write, simultaneous of 1 GB files at 64k blocks • Write: 5.1 MB/s • Read: 30.0, 30.0 MB/s • Aggregate • 65 MB/s / 90 MB/s 72% utilization HEPiX/HEPNT