160 likes | 286 Views
MINERVA USER GROUP MEETING 3 July 2012. Minerva Operational Statistics. Minerva Operational Statistics. Minerva Usage By User. Remaining Users CPU Hours. Minerva Usage By Group. Minerva Utilization Mid-April - June. Minerva Utilization May - June. Minerva Scratch Usage. /scratch.
E N D
MINERVA USER GROUP MEETING 3 July 2012 MUG - Mid April - June Period
Minerva Operational Statistics MUG - Mid April - June Period
Minerva Operational Statistics MUG - Mid April - June Period
Minerva Usage By User MUG - Mid April - June Period
Remaining Users CPU Hours MUG - Mid April - June Period
Minerva Usage By Group MUG - Mid April - June Period
Minerva UtilizationMid-April - June MUG - Mid April - June Period
Minerva UtilizationMay - June MUG - Mid April - June Period
Minerva Scratch Usage /scratch /projects MUG - Mid April - June Period
Other Plans/Projects • Archival Storage • Ordered: Tape Library with 4 Tape transports • 350TB tape capacity • Anticipated 1 Sep 2012 start of service • GPGPU • Chassis w/2 Fermi-based Tesla cards ordered • Target availability date is 1 Aug 2012 • Checkpoint/Restart (BLCR) • Partially Installed – needs reboot of systems and testing. • Monthly Training Meetings • Third Tuesday of Month • Alternate between basic and advanced MUG - Mid April - June Period
Hiccups Scheduler Failure: Problem: June Tripled previous job count. Scheduler database table overflowed. Resolution: We put limits for the number of jobs per user in Torque and Moab. Long Term: Newer version of Torque and Moab. Move to a SQL Database. Infiniband / MPI Issues: Problem: Mellanox driver buffer overflowing because of 64 core systems. Resolution: We built a custom version of the Mellanox driver. Long Term: Working with Mellanox to add changes to mainline code. AMD 64core understanding + performance: Problem: Misunderstanding of number of 32 FPU’s in a system, not 64. Also the ACML Library is not tuned for the FFTW Library. Resolution: Changed scheduling to allow blocks of 32 and job exclusive nodes. Long Term: AMD is creating a new ACML library with tuned FFT sizes. MUG - Mid April - June Period
Open ForumRequested/Suggested Topics • Bioconductor R site-library • Should we put all Bioconductor R packages in one library? ( module load bioconductor) • Epilogue report • Report job resource resource usage to syserr? • PM Schedule • Can we reduce PM’s to monthly? • Fairshare • Comments? Feedback? MUG - Mid April - June Period