60 likes | 204 Views
Atlas Status Update. Chris Fuson. Atlas Update - Timeline. March 06, 2014 Installed patches that targeted memory contention on the meta data server to address server side performance problems February 26, 2014
E N D
Atlas Status Update Chris Fuson
Atlas Update - Timeline • March 06, 2014 • Installed patches that targeted memory contention on the meta data server to address server side performance problems • February 26, 2014 • Installed patch to reduce impact of close operations to address server side meta data performance problems • February 10, 2014 • Titan’s Lustre client rolled back to 1.8.6 to address client side performance problems • January 28, 2014 • Titan’s Lustre client upgraded to 2.4. Un-mounted Widow. • January 10, 2014 • As the user load from this transition increased, we began to see problems with both the Lustre server and client (compute node) performance • January 07, 2014 • Widow[1-3] became read-only • December 05, 2013 • Atlas was mounted on all OLCF systems, announced, and opened for use
Atlas Update - Current • Following the March 06, 2014 change to reduce memory contention on the metadata server, we continue to see qualified improvements in the interaction with Atlas. • Improvements have been substantial for several applications that were negatively affected before. • We encourage users to continue testing their application performance in light of these changes and report their results. • We will continue to pursue the remaining issues, and will intentionally address them outside of the production environment as to minimize further interruption to the Atlas file systems. • Your feedback is incredibly valuable. Please continue to report problems related to the file system, including any specific timings for I/O operations, to help@olcf.ornl.gov.
Atlas Update – Stripe Count Warning • Warning: Stripe Counts Greater than 160 Not Currently Supported • Warning: “-1” should NOT be used while setting up striping patterns • The 1.8 Lustre clients running on Titan do not support stripe counts greater than 160. Interaction from Titan (including ‘lfs getstripe’) with files that have a stripe greater than 160 is problematic. • If ‘lfs setstripe’ was used to set the stripe of a directory or file and the stripe count was set to a value greater than 160 or ‘-1′, you should reduce the stripe value. titan-ext3 1004> lfssetstripe -c -1 test.file titan-ext3 1005> lfsgetstripetest.file | grepstripe_count lmm_stripe_count: 1008 *** glibc detected *** lfs: munmap_chunk(): invalid pointer: 0x000000000067fed0 *** • Please note the stripe count is only an issue on Titan; the count is not an issue on Eos, Rhea, or the Data Transfer Nodes due to the more recent Lustre client version in use on those systems.
Atlas Update – Reduce Stripe Count • Create new directory with reduced striping • Copy data into new directory • cpfor small data amounts • dcpfrom the Data Transfer Nodes for larger amounts of data dtn04 115> mkdirNewDir dtn04 116> lfssetstripe -c 128 NewDir dtn04 117> cptest.fileNewDir/. dtn04 118> lfsgetstripeNewDir/test.file | grepstripe_count lmm_stripe_count: 128 dtn04 119>
Questions? • More information: • www.olcf.ornl.gov/kb_articles/atlas-update/ • www.olcf.ornl.gov/kb_articles/lustre-basics/ • Email: • help@olcf.ornl.gov