1 / 15

First experiences with large SAN storage in a Linux cluster

First experiences with large SAN storage in a Linux cluster. Jos van Wezel Institute for Scientific Computing Karlsruhe, Germany jvw@iwr.fzk.de. Overview. The GridKa center SAN and Parallel storage hardware software Performance tests results NFS load balancing. GridKa in a nutshell.

ganit
Download Presentation

First experiences with large SAN storage in a Linux cluster

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. First experiences with large SAN storage in a Linux cluster Jos van Wezel Institute for Scientific Computing Karlsruhe, Germany jvw@iwr.fzk.de

  2. Overview • The GridKa center • SAN and Parallel storage • hardware • software • Performance tests results • NFS load balancing Jos van Wezel / ACAT03

  3. GridKa in a nutshell • Test environment for LHC (ALICE, ATLAS, CMS, LHCb) • Test and development environment for CrossGrid, LCG, .... • Production platform for BaBar, CDF, D0, Compass • Tier 1 for LHC after 2007 • 2003 • 500 CPUs • 120 TB disk storage • 350 TB tape storage • 2 Gb feed • 2007 (est.) • 4000 CPUs • 1200 TB disk storage • 3500 TB tape storage • ? Gb feed Jos van Wezel / ACAT03

  4. SAN components • Disk racks • 5 x 140 x 146GB FC disks • 5 x 4 port LSI controller • 70 x 1 TB LUNs • File servers • 2 x 2.4 GHz XEON • 1.5 MB Mem, 18Gb disk • Qlogic 2312 HBA • 2 x Broadcom 1Gb Ethernet • Fibre Channel Switch • 128 X 2 Gb ports • non blocking fabric Jos van Wezel / ACAT03

  5. Storage Area Network Advantages • Easier administration: all storage is seen on all file servers • Expansion or exchange during production: less down time • Load balancing / redundancy: several paths to same storage • Very fast: 2Gb/s (200MB/s) • Scalable: add switches and controllers • Small overhead and CPU load: HBA handles protocol Disadvantages • Expensive: 1000$ per HBA or Port • prices will drop with iSCSI • a (S)ATA + IP solution would need more controllers (host) and a separate LAN • No standardized (fabric) management With est. 4000 disks in 2007 there is no manageable alternative expect a (SCSI) disk to fail every 2 weeks Jos van Wezel / ACAT03

  6. GPFS features • Parallel file system • R/W to the same file from more then one node • High scalable • scales with number of disks • scales with number of nodes • distributed locks • UNIX/POSIX IO semantics • Large volume sizes • currently 18TB (Linux ext2 max 1.9TB) • VFS layer. Possibility to export via e.g. NFS • Extensive fault tolerance and recovery possibilities • survives failed nodes • mirroring • fsck • On-line file system expansion and disk replacement • Proprietary IBM Jos van Wezel / ACAT03

  7. Application Application Application GPFS GPFS GPFS FC driver FC driver FC driver disk collection GPFS functional diagram IP network file servers switch, fibre channel fabric Jos van Wezel / ACAT03

  8. GridKa scalable IO design . . . n Compute nodes IP/TCP/NFS Expansion (disks, servers) Servers SAN/SCSI Fibre Channel RAID storage Jos van Wezel / ACAT03

  9. Environment and throughput tests • Setup: • 10 File systems / Mount points • Each file system comprises 5 RAID5 (7+P) groups • Kernel 2.4.20-18 on servers • Kernel 2.4.18-27 on clients • NFS V3 on server and clients (over UDP) • Random IO • Multiple threads on GPFS node • Sequential IO • W/R on GPFS node • W/R on NFS to GPFS Jos van Wezel / ACAT03

  10. Reading Writing Throughput vs. number of threads on 1 GPFS node MB/s Jos van Wezel / ACAT03

  11. Reading Writing Accumulated throughput as function of number of nodes/raid-arrays MB/s Jos van Wezel / ACAT03

  12. Reading Writing Accumulated throughput as function ofnumber of NFS clients MB/s Jos van Wezel / ACAT03

  13. NFS load balancing (components) Use Automounter with a program map Transparent for existing NIS maps DNS is used to add or remove servers via multiple PTR records • Program map algorithm. For a given key: • retrieve existing NIS entry • find DNS host(s) • select host randomly if more then one • check availability with nfsping • return NIS entry with, possibly replaced, hostname Jos van Wezel / ACAT03

  14. NFS load balancing (result) Number of mounts per server in recent weeks Jos van Wezel / ACAT03

  15. Conclusions • SAN and Linux go together very well • NFS on the recent Linux kernels is a huge improvement • The GPFS/NFS combination is a viable cluster storage solution • Relationship between local server and NFS throughput unclear Work to do • Improve proportion local-server-throughput vs NFS-throughput • Improve write behavior • Connection to background storage (tape via dCache) Thank you: Manfred Alef, Ruediger Berlich, Michael Gehle, Marcus Hardt, Bruno Hoeft, Axel Jaeger, Melanie Knoch, Marcel Kunze, Holger Marten, Klaus-Peter Mickel, Doris Ressmann, Ulrich Schwickerath, Bernhard Verstege, Ingrid Schaeffner

More Related