60 likes | 136 Views
RDAV Update. Phil Andrews Science Advisory Board Meeting 20-21 January 2011. Executive summary. Nautilus SGI UltraViolet passed all acceptance criteria and was accepted by NICS/RDAV in September 2010. RDAV resources have gone through TRAC allocations twice and we have currently active users.
E N D
RDAV Update • Phil Andrews • Science Advisory Board Meeting20-21 January 2011
Executive summary • Nautilus SGI UltraViolet passed all acceptance criteria and was accepted by NICS/RDAV in September 2010. • RDAV resources have gone through TRAC allocations twice and we have currently active users. • Allocations are lower than expected. • Use is lower than expected. • Early users are transitioning to TRAC / Director’s Discretionary users.
Hardware status • Nautilus: The full SGI UltraViolet has been delivered and accepted: • The full UltraViolet machine was delivered and integrated into the NICS infrastructure. • There were issues with stability and PCIe performance during the acceptance tests. Those issues have been resolved and the machine accepted. • Graphics cards: There are issues that prevent delivery of the GPUs on Nautilus: • NVIDIA has decided not to scale their driver to support more than 8 GPUs per single system image. Thus, we cannot deliver the full 16 GPUs on Nautilus. • Even worse, there are communication problems on the UltraViolet that cause system stability problems when the GPUs are exercised. Until this is resolved, we have disabled access to the GPUs. • Parallel filesystem: The 960 TB GPFS parallel filesystem has been deployed on Nautilus. • We continue to explore issues with bandwidth, as we are seeing only ~1 GB/s. It appears to be a design issue with GPFS. We are talking with IBM about these issues. • We are working to enable cross-mounting of the parallel filesystem on Kraken for HPC users. • Portal: Our portal system is operational and we are working to deploy new capabilities on it.
Software and environment status • Software systems: • VisIt has been ported and runs well. It was a major component of our acceptance tests. • ParaView porting has begun. • Remote visualization systems are deployed and secure: NX, VNC • Workflow systems work well. • R runs acceptably, and several packages for exploiting parallelism have been deployed to users. • User environment: • We continue to explore issues related to job placement, and have deployed a NUMA-aware Torque for scheduling. • We are moving batch scheduling and login processes to a separate system to reduce user contention.
Education, Outreach, and Training activities • Presented a tutorial on Nautilus usage for visualization, data analysis, and workflow management at the TeraGrid'10 conference in Pittsburgh. • With LLNL and LBNL, taught a full day class on VisIt at the Supercomputing 2010 conference in New Orleans.