290 likes | 381 Views
The NERSC Global File System NERSC June 12th, 2006. Overview. NGF: What/Why/How NGF Today Architecture Who’s Using it Problems/Solutions NGF Tomorrow Performance Improvements Reliability Enhancements New Filesystems(/home). What is NGF?. NERSC Global File System - what.
E N D
The NERSC Global File System NERSC June 12th, 2006
Overview • NGF: What/Why/How • NGF Today • Architecture • Who’s Using it • Problems/Solutions • NGF Tomorrow • Performance Improvements • Reliability Enhancements • New Filesystems(/home)
NERSC Global File System - what • What do we mean by a global file systems? • Available via standard APIs for file system access on all NERSC systems. • POSIX • MPI-IO • We plan on being able to extend that access to remote sites via future enhancements. • High Performance • NGF is seen as a replacement for our current file systems, and is expected to meet the same high performance standards
NERSC Global File System - why • Increase User productivity • To reduce users’ data management burden. • Enable/Simplify workflows involving multiple NERSC computational systems • Accelerate the adoption of new NERSC systems • Users have access to all of their data, source code, scripts, etc. the first time they log into the new machine • Enable more flexible/responsive management of storage • Increase Capacity/Bandwidth on demand
NERSC Global File System - how • Parallel • Network/SAN heterogeneous access model • Multi-Platform (AIX/linux for now)
NGF current architecture • NGF is a GPFS file system using GPFS multi-cluster capabilities • Mounted on all NERSC systems as /project • External to all NERSC computational clusters • Small linux server cluster managed separately from computational systems. • 70 TB user visible storage. 50+ Million inodes. • 3GB/s aggregate bandwith
/project • Limited initial deployment - no homes, no /scratch • Projects can include many users potentially using multiple systems(mpp, vis, …) and seemed to be prime candidates to benefit from the NGF shared data access model • Backed up to HPSS bi-weekly • Will eventually receive nightly incremental backups. • Default project quota: • 1 TB • 250,000 inodes
/project – 2 • Current usage • 19.5 TB used (28% of capacity) • 2.2 M inodes used (5% of capacity) • NGF /project is currently mounted on all major NERSC systems (1240+ clients): • Jacquard, LNXI Opteron System running SLES 9 • Da Vinci, SGI Altix running SLES 9 Service Pack 3 with direct storage access • PDSF IA32 Linux cluster running Scientific Linux • Bassi, IBM Power5 running AIX 5.3 • Seaborg, IBM SP running AIX 5.2
/project – problems & Solutions • /project has not been without it’s problems • Software bugs • 2/14/06 outage due to Seaborg gateway crash – problem reported to IBM, new ptf with fix installed. • GPFS on AIX5.3 ftruncate() error on compiles – problem reported to IBM. efix now installed on Bassi. • Firmware bugs • FibreChannel Switch bug – firmware upgraded. • DDN firmware bug(triggered on rebuild) – firmware upgraded • Hardware Failures • Dual disk failure in raid array – more exhaustive monitoring of disk health including soft errors now in place
NGF – Solutions • General actions taken to improve reliability. • Pro-active monitoring – see the problems before they’re problems • Procedural development – decrease time to problem resolution/perform maintenance without outages • Operations staff activities – decrease time to problem resolution • PMRs filed and fixes applied – prevent problem recurrence • Replacing old servers – remove hardware with demonstrated low MTBF • NGF Availability since 12/1/05: ~99% (total down time: 2439 minutes)
Current Project Information • Projects using /project file system: (46 projects to date) • narccap: North American Regional Climate Change Assessment Program – Phil Duffy, LLNL • Currently using 4.1 TB • Global model with fine resolution in 3D and time; will be used to drive regional models • Currently using only Seaborg • mp107: CMB Data Analysis – Julian Borrill, LBNL • Currently using 2.9 TB • Concerns about quota management and performance • 16 different file groups
Current Project Information • Projects using /project file system (cont.): • incite6: Molecular Dynameomics – Valerie Daggett, UW • Currently using 2.1 TB • snaz: Supernova Science Center – Stan Woosley, UCSC • Currently using 1.6 TB
NGF Performance • Many users have reported good performance for their applications(little difference from /scratch) • Some applications show variability of read performance(MADCAP/MADbench) – we are investigating this actively.
Current Architecture Limitations • NGF performance is limited by the architecture of current NERSC systems • Most NGF I/O uses GPFS TCP/IP storage access protocol • Only Da Vinci can access NGF storage directly via FC. • Most NERSC systems have limited IP bandwidth outside of the cluster interconnect. • 1 gig-e per I/O node on Jacquard. each compute node uses only 1 I/O node for NGF traffic. 20 I/O noodes feed into 1 10Gb ethernet • Seaborg has 2 gateways with 4xgig-e bonds. Again each compute node uses only 1 gateway. • Bassi nodes each have 1-gig interfaces all feeding into a single 10Gb ethernet link
Performance Improvements • NGF Client System Performance upgrades • Increase client bandwidth to NGF via hardware and routing improvements. • NGF storage fabric upgrades • Increase Bandwidth and ports of NGF storage fabric to support future systems. • Replace old NGF Servers • New servers will be more reliable. • 10-gig ethernet capable. • New Systems will be designed to support High performance to NGF.
NGF /home • We will deploy a shared /home file system in 2007 • Initially only home for 1 system, may be mounted on others. • New systems thereafter all have home directories on NGF /home • Will be a new file system with tuning parameters configured for small file accesses.
/home layout – decision slide Two options • A user’s login directory is the same for all systems • /home/matt/ • A user’s login directory is a different subdirectory of the user’s directory for each system • /home/matt/seaborg • /home/matt/jacquard • /home/matt/common • /home/matt/seaborg/common -> ../common
One directory for all • Users see exactly the same thing in their home dir every time they log in, no matter what machine they’re on. • Problems • Programs sometimes change the format of their configuration files(dotfiles) from one release to another without changing the file’s name. • Setting $HOME affects all applications not just the one that needs different config files • Programs have been known to use getpwnam() to determine the users home directory, and look there for config files rather than in $HOME • Setting $HOME essentially emulates the effect of having separate home dirs for each system
One directory per system • By default users start off in a different directory on each system • Dotfiles are different on each system unless the user uses symbolic links to make them the same • All of a users files are accessible from all systems, but a user may need to “cd ../seaborg” to get at files he created on seaborg if he’s logged into a different system
NGF /home conclusion • We currently believe that the multiple directories option will result in less problems for the users, but are actively evaluating both options. • We would welcome user input on the matter.
NGF /scratch • We plan on deploying a shared /scratch to NERSC-5 sometime in 2008