1 / 37

Effective Use of NERSC File Systems

Effective Use of NERSC File Systems. Thomas M. DeBoni NERSC/USG. Effective Use of NERSC File Systems. Contents Home Directories Scratch Space Mass Storage Networked File Systems Resource Conservation Examples Details See also

ricky
Download Presentation

Effective Use of NERSC File Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effective Use of NERSC File Systems Thomas M. DeBoni NERSC/USG

  2. Effective Use of NERSC File Systems Contents Home Directories Scratch Space Mass Storage Networked File Systems Resource Conservation Examples Details See also http://home.nersc.gov/training/tutorials/file.management.html

  3. Home Directories • Your private portion of the file systems on a computer • Default “current working directory” when you log in • Knows as system environment variable $HOME” • Usage limited in bytes • Max size is 2 GB on T3E, 5 GB on J-90s • Warnings issued at 75-90% usage • There’s not enough space for everybody to use this much at the same time, so “migration” sometimes happens • Usage limited in “inodes” • An “inode” is a file or directory • Max number is 6000 on T3E, unsettled on J-90s • Warnings issued at 75-90% usage

  4. Home Directories, cont. • $HOME • is routinely backed up • is shared among all J-90’s • is NOT the fastest file system available • should be used for development, debugging, pre- and post-processing, and other administrative tasks • should NOT be routinely used by large jobs requiring high performance • Startup files (.cshrc, .login, etc.) may change your working directory on login • Remove all references to $WRK from these files

  5. Home Directories, cont. • Files can be “migrated” to backing store • Largest and oldest files first • De-migrate with “dmget” command before using, or • Automatically de-migrated when referenced, but with unknown delay • Example listing: killeen 257: ls -al total 64 drwx------ 2 u10101 zzz 4096 Sep 21 11:11 . drwxr-xr-x 5 u10101 zzz 4096 Sep 21 11:11 .. mrw------- 1 u10101 zzz 2414 Sep 21 11:11 decomp.job.log mrw------- 1 u10101 zzz 2712 Sep 21 11:11 decomp.job.out -rw------- 1 u10101 zzz 2381 Sep 21 11:11 decomp.job2.log -rw------- 1 u10101 zzz 11490 Sep 21 11:11 decomp.job2.out “m” in first column means file has been migrated

  6. Home Directories, cont. • A word about “quotas” • Use the “quota” command to view them: mcurie 154: quota File system: /u5 User: deboni, Id: 9950 Aggregate blocks (512 bytes) Inodes User Quota: 3906240* ( 19.2%) 3500* ( 17.2%) Warning: 3515616* ( 21.4%) 2975* ( 20.2%) Usage: 751416 602 Current usage is this % of max usage is this % of warning level • Maximum usage • allowed • Level at which • warning will be • issued • Current usage

  7. Scratch Space • Also known as temporary storage or working storage • A pool of fast RAID drives • The fastest file system available • Unique to each batch system • Not backed up • Usage limits are larger than for $HOME • 75 GB and 5000-6000 inodes on T3E, unsettled on J-90s • Should be used for large files and high performance jobs • This is transient space, and persistence will vary with usage and demand

  8. Scratch Space, Cont. • System environment variable “$TMPDIR” • Created for each session or batch job • Randomly named, so always use $TMPDIR to refer to it • Deleted at the end of session or job • Use “$TMPDIR” if you want the OS to manage your scratch space usage for you. • E.g., on the J-90s, you can’t log on to batch machines, and each has its own scratch space, so you can’t get at it directly, as you can on the T3E.

  9. Scratch Space, cont. • /tmp or /usr/tmp • Create directories there for yourself • Watch out for name collisions with other users’ directories • Delete files and directories as you finish with them • This space will be scavenged depending on demand: largest and oldest files are usually deleted first • “It should be safe for 7 to 14 days…” • You must manage this scratch space for yourself

  10. Scratch Space, cont. • Pre-staging files to scratch space is a good idea, but... • You don’t know when your batch job will run, so it may not work when batch queues are heavily loaded • Staging files in a batch script is a good idea, but... • It idles your processor ensemble and uses up serial time • So, do it both ways: #!/bin/csh -f # Change to scratch directory cd /tmp/mydir # Check for presence of pre-staged files if (-e foo.input) then echo "input file prestaged" else echo "fetching input file" hsi "get foo.input" endif

  11. Scratch Space, cont. • What about intermediate I/O? Example to follow… • What about I/O from parallel programs? • This is a deep topic • Beware of interleaving of multiple outputs to a single file • J-90 codes typically use a small number of files at a time • T3E codes may use hundreds of files at a time • Special mechanisms exist to manage parallel files and I/O • General rule: the more you do, the faster it should be • See these sources for further info: • http://home.nersc.gov/software/prgenv/opt/binary.html • http://home.nersc.gov/training/tutorials/T3E/IO/ • http://www.cray.com/swpubs/ (See, especially , " CRAY T3E Fortran Optimization Guide, SG-2518 3.0”)

  12. Mass Storage • NERSC provides the High Performance Storage System (HPSS) • A modern, flexible, hierarchical system • Optimized for large files and fast transfers • Built on multiple disk farms and tape libraries • Used for system backups, file migration, and user archives • Has multiple user interface utilities • HSI - powerful, flexible, convenient utility, from SDSC and NERSC • pftp - parallel ftp, locally customized, fastest for large files • ftp - traditional version, available everywhere • The proper place to save large, important, long-lived files (e.g. raw output and restart data) • Requires a separate storage account (“DCE account”), but can automatically authenticate after initial setup

  13. Networked File Systems • “Networked” or “distributed” file systems are intended to decouple a file’s physical location from it’s logical location • Can be very convenient, but also dangerous • There are three of interest • NFS - Developed at Sun and has become a standard in workstation environments • Used as little as possible at NERSC, due to security and performance concerns • AFS - More modern, global in scope, with pretty good security • Used at NERSC via the “gateway system” dano.nersc.gov • Use AFS with care - it can ruin performance • DFS - A “coming standard” that NERSC is evolving toward; also has good security and will be global in scope

  14. Resource Conservation • Critical resources are expensive and rare • They are shared among (competed for by) all users • Four critical resources related to file systems use: • Storage space - the actual files and bytes of data • File system entries - “inodes”; one per file or directory • Bandwidth - bits per second, in transfers between devices • Time - servers, I/O devices, and CPU cycles • NERSC meters (charges for) all these types • Resource conservation must be engineered in (don’t depend on luck)

  15. Bandwidth Conservation • Design parallel I/O carefully • Human readable I/O probably should be done by a single (master) process(or) to/from a single file • Binary I/O may be done by one process(or) or by many, as required • Binary data may occupy many files which match problem decomposition for parallel execution • Limits exist on the number of files that can be open at any one time • Flushing larger buffers is usually a better idea than flushing smaller ones more often. • For further info, see • Cray publication “Application Programmer's I/O Guide” • NERSC web doc http://home.nersc.gov/training/tutorials/T3E/IO/ • Man page for the “assign” command

  16. Bandwidth Conservation, cont. • Transfer files carefully • Session setup is not free - move files to/from mass storage in as few sessions or commands as possible; don’t run pftp in a loop • Use multiple-file transfer commands, such as “mget”, when possible • Use the appropriate utility for the job • “Meta-data operations do not involve actual file access • Renaming files or directories or moving files around within HPSS • changing file or directory permissions in HPSS • Use hsi for these sorts of operations, for efficiency • For further info see • NERSC web doc http://home.nersc.gov/hardware/storage/hpss.html • NERSC web doc http://home.nersc.gov/hardware/storage/hsi.html

  17. Bandwidth and Time Conservation • Use the fastest utility available • Use pftp when moving within the NERSC domain • Use multiple-file transfer commands, such as “mget”, when possible • Use ftp when moving files into or out of the NERSC domain • Use the fastest devices and networks available • This is a deep area … and oversimplified here ... • Don’t make a fast machine wait on a slow one • Pre-stage and de-migrate files needed by batch jobs into fast storage space • Sometimes a multiple-step process is better • First, move files from outside NERSC onto a NERSC computer (a workstation) • Then, move files from the NERSC computer to the destination device • Avoid networked file systems • For further info see • NERSC web dochttp://home.nersc.gov/hardware/storage/hpss.html • Man pages for ftp, pftp, and hsi

  18. Bandwidth and Storage Conservation • Shrink files, if appropriate • If the file contains redundant or unimportant data • Such as white space in formatted output • Use Unix commands “compress” and “gzip” • Combine files into archives • If the files are small, transferring them individually may involve more setup time than transfer time • Use Unix commands “tar”, “ar”, and “cpio” • For more info, see • Man pages for all the above commands

  19. Example 1 - batch pftp multiple file access with a here-doc #!/bin/csh ... # First, copy the source from the submitting directory and compile it. pftp -i -v archive <<+ cd my_HPSS_directory mget data* myprog quit + ja ./myprog <data >outfile ja -cst # Save the output file in HPSS. pftp -i -v archive <<+ cd my_HPSS_directory mput outfile* restart* quit + exit Here-document

  20. Example 2 - batch hsi multiple file access #!/bin/csh ... # First, copy the source from the submitting directory and compile it. hsi archive “cd my_HPSS_directory; mget data* myprog” ja ./myprog <data >outfile ja -cst # Save the output file in HPSS. hsi archive “cd my_HPSS_directory; mput outfile* restart*” exit

  21. Example 3 - Minimizing parallel cpu idling during pftp I/O #!/bin/csh -f # preliminary job steps, including fetching executables and input files . . . # parallel code execution with mpprun set i = 1 mpprun -n 128 a.out < bigjob.input > bigjob.output # Intermediate file movement, to save output file to mass storage: mv bigjob.output bigjob.output$i mv bigjob.restart bigjob.restart$I # Generate a separate serial job to do the actual I/O echo "pftp -i -v archive <<EOF\ mkdir bigjobs/job.06.15.99\ cd bigjobs/job.06.15.99 \ mput bigjob.output$i bigjob.restart$i \ ls\ quit\ EOF" | qsub -q serial . . . # further parallel code execution, perhaps through shell script looping @ i = $i + 1 . . .

  22. Example 4 - Minimizing parallel cpu idling during HSI I/O #!/bin/csh -f # preliminary job steps, including fetching executables and input files . . . # parallel code execution with mpprun set i = 1 mpprun -n 128 a.out < bigjob.input > bigjob.output # Intermediate file movement, to save output file to mass storage: mv bigjob.output bigjob.output$i mv bigjob.restart bigjob.restart$i # Generate a separate serial job do to the actual I/O echo ”hsi archive ‘mkdir bigjobs/job.06.15.99; cd bigjobs/job.06.15.99; mput bigjob.output$i bigjob.restart$i; ls’“ | qsub -q serial . . . # further parallel code execution, perhaps through shell script looping @i = $i + 1 . . .

  23. Example 5 - Minimizing usage with tar and compress mcurie 181: ls -al STD* bigjob* -rw-r--r-- 1 deboni mpccc 0 Dec 23 12:26 STDIN.e48938 -rw-r--r-- 1 deboni mpccc 0 Dec 23 12:27 STDIN.e48939 -rw-r--r-- 1 deboni mpccc 1126 Dec 23 12:26 STDIN.l48938 -rw-r--r-- 1 deboni mpccc 1126 Dec 23 12:27 STDIN.l48939 -rw-r--r-- 1 deboni mpccc 7181 Dec 23 12:26 STDIN.o48938 -rw-r--r-- 1 deboni mpccc 7181 Dec 23 12:27 STDIN.o48939 -rw------- 1 deboni mpccc 486 Feb 4 11:24 bigjob.output -rw------- 1 deboni mpccc 486 Feb 4 12:10 bigjob.output1 -rw------- 1 deboni mpccc 486 Feb 4 12:10 bigjob.output2 -rw------- 1 deboni mpccc 972 Feb 4 11:24 bigjob.restart -rw------- 1 deboni mpccc 972 Feb 4 12:10 bigjob.restart1 -rw------- 1 deboni mpccc 972 Feb 4 12:10 bigjob.restart2 #--------------------total space = 20988 bytes and 12 inodes mcurie 182: tar cf batch.tar STD* bigjob* mcurie 183: ls -al batch.tar -rw------- 1 deboni mpccc 65536 Feb 5 09:15 batch.tar mcurie 184: compress batch.tar mcurie 185: ls -al batch.tar* -rw------- 1 deboni mpccc 9313 Feb 5 09:15 batch.tar.Z mcurie 186: rm STD* bigjob* #--------------------total space = 9313 bytes and 1 inode

  24. Example 6 - Minimizing usage with cpio cd $HOME /bin/find . -type f -size -15000c -atime +90 ! \ ( -type m -o -type M \) \ -print > hitlist vi hitlist cat hitlist | cpio -co > myfiles.cpio cat hitlist | xargs rm -f Here's what the above commands (NOT a shell script!) do: 1) First, cd to home directory and generate a list of eligible files; 2) the find command will find regular files smaller than 15000 chars, that have not been accessed in 90 days, and are not migrated. 3) Use "vi" to examine the list and delete items from it that you do not want removed. 4) The fourth line will create the cpio file archive. 5) The fifth line will remove all the files now stored in the archive.

  25. Details - some useful ftp and pftp commands FTP Commands Meanings or actions PFTP Variants get <rf> [<lf>] retrieve a file pget put <lf> [<rf>] store a file pput mget <f> [<f>…] retrieve multiple files mpget mput <f> [<f>…] store multiple files mpput del <f> delete a file mdel <f> [<f>…] delete multiple files mkdir <d> create a remote directory rmdir <d> delete a remote directory cd <d> change to remote dirctory lcd <d> change to local directory ls, dir list files in directory ldir list files in local directory !<cmd> perform <cmd> locally outside ftp/pftp <f> = file name, <lf> = local file name, <rf> = remote file name Caveats: Be aware of where your actions will take place Watch out for name collisions

  26. Details - HSI commands HPSS File and Directory Commands get, mget, recv - Copy file(s) from HPSS to a local directory cget - Copy file from HPSS to a local directory if not already there put, mput, replace, save, store, send - Copy local file(s) to HPSS cput - Copy local file to HPSS if it doesn’t already exist there cp, copy - Copy file within HPSS mv, move, rename - Rename/relocate an HPSS file delete, mdelete, erase, rm - Remove a file from HPSS ls, list - List directory

  27. Details - HSI commands, cont. HPSS File and Directory Commands, cont. find - Traverse a directory tree looking for a file mkdir, md, add - Create an HPSS directory rmdir, rd, remove - Delete an HPSS directory pwd - Print current directory cd, cdls - Change current directory Local File and Directory Commands lcd, lcdls - Change local directory lls - List local directory lpwd - Print current local directory ! - Issue shell command

  28. Details - HSI commands, cont. File Administrative Information chmod - Change permissions of file or directory umask - Set file creation permission mask Miscellaneous HSI commands help - Display help file quit, exit, end - Terminate HSI in - Read commands from a local file out - Write HSI output to a local file log - Write commands and responses to a log file prompt - Toggles prompting for mget, mput, mdelete

  29. Details - HSI commands, cont. HSI can accept input several different ways: • From a command session, consisting of multiple lines and ending with an explicit termination command • From a single line command, with semicolons (“;”) separating commands hsi “mkdir foo; cd foo; put data_file” • From a command file hsi “in command_file” • HSI can read from standard input and write to standard output tar cvf - . | hsi put - : datadir.tar hsi get - : datadir.tar | tar xvf - Wildcards are supported, but quoting must be used in one-line commands to prevent shell interpretation. hsi “cd foo; mget data*”

  30. Details - HSI commands, cont. WARNING: For 'get' and 'put' operations, HSI uses a different syntax than ftp; a colon (“:”) is used to separate local and remote file names. put local_file : hpss_file get local_file : hpss_file Recursive operations are allowed for the following commands: cget, chgrp, chmod, chown, cput, delete, get, ls, mdelete, mget, mput, put, rm Special commands exist for setting up variables whose values are directories, commands, and command-sets. The complete HSI manual is online at http://home.nersc.gov/hardware/storage/hsi.html

  31. Details - Tasks that HSI Simplifies • Accessing segmented CFS files: • CFS handled files larger then 400 MB by spliting them into smaller subfiles and storing the subfiles. HSI is the only utility that can read and rejoin segmented CFS files to reproduce their original state. The procedure for handling such files is quite simple: simply read the first of the segmented subfiles from the archive storage system. • Renaming/moving or copying an entire subdirectory : • <mv/cp> path1 path2 renames/copies path1 to path2 • Changing the permissions of several files at once: • chmod perms files changes the permissions of all files to perms; the file specifications may include wildcards; the permissions may be given as octal numbers or via symbolic designators.

  32. Details - Getting Access to AFS Directories Don’t do this in batch jobs! killeen 210: telnet dano.nersc.gov Trying 128.55.200.40... Connected to dano.nersc.gov. Escape character is '^]'. Hello killeen.nersc.gov. ========================================================================= > WARNING: Unauthorized access to this computer system is < > prohibited, and is subject to criminal and civil penalties. < > Use constitutes consent to security testing and monitoring. < ========================================================================= UNIX(r) System V Release 4.0 (dano) login: u10101 Password: AFS (R) 3.4 Login Last login: Mon Sep 21 10:00:14 from killeen.nersc.gov

  33. Details - Getting Access to AFS Directories, cont. ********************************************************* * AFS gateway user interface * * * * used to enable AFS access on J90's and T3E * * (select enable attached hosts (1) before exiting) * ********************************************************* * * * 1) enable attached hosts (knfs) * * 2) disable attached hosts (unlog) * * 3) list tokens (tokens) * * 4) authenticate to another cell (klog) * * * * 5) help * * 0) exit (logoff) * ********************************************************* enter command(0-5): 3 Tokens held by the Cache Manager: User's (AFS ID 9950) tokens for afs@nersc.gov [Expires Oct 17 15:55] --End of list--

  34. Details - Getting Access to AFS Directories, cont. enter command(0-5):0 Connection closed by foreign host. killeen 213: pwd /U3/u10101 killeen 214: cd /afs/nersc.gov killeen 215: pwd /afs/nersc.gov Option 4 is used to attach other cells, regardless of location, but you must have login and password to use in the “klog” process.

  35. Details - Dealing With Your DCE Account • DCE is a modern authentication methodology that will likely evolve into general use at NERSC • Right now, it merely control access to HPSS • DCE accounts and login/password info must be gotten from NERSC Support staff • Initial login is necessary to change from initial password, and to set up future automatic authentication • It has been occasionally necessary for a few users to re-initialize their accounts • Both procedures are easy • DCE is currently most reliable on killeen.nersc.gov

  36. Details - Dealing With Your DCE Account, cont. Initial Setup: Once you have your initial DCE login and password, change it with the following procedure, on any NERSC mainframe: % dce_login Enter Principal Name: <HPSS_user_name> Enter Password: <current_or_temporary_HPSS/DCE_password> % chpass -p Changing registry password for HPSS_user_name New password: <new_HPSS/DCE_passwd> Re-enter new password: <new_HPSS/DCE_passwd> Enter your previous password for verification: <current_or_temporary_HPSS/DCE_password> % kdestroy % exit % You will need to log in to HPSS only on your next use, and thereafter you will be automatically authenticated.

  37. Details - Dealing With Your DCE Account, cont. If you should get the following message from HPSS... mcurie 224: hsi hpss credential user mismatch use -l option to generate a new cred file DCE Principal: …it means automatic authentication has failed. You must authenticate manually, until you re-initialize authentication: mcurie 232: hsi -l hpss DCE Principal: u10101 Password: *-----------------------------------------------------------* * NERSC HPSS USER SYSTEM(hpss) * *-----------------------------------------------------------* V1.5 Username: u10101 UID: 0123 ? quit Subsequent usage should not require full login. In rare and unusual situations, do “rm .hsipw” and then repeat the above.

More Related