310 likes | 477 Views
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute). Anne Weill – Zrahia Technion,Computer Center October 2008. Resources needed for applications arising from Nanotechnology. Large memory – Tbytes High floating point computing speed – Tflops
E N D
Nanco: a large HPC cluster for RBNI(Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008
Resources needed for applications arising from Nanotechnology • Large memory –Tbytes • High floating point computing speed –Tflops • High data throughput – state of the art …
SMP architecture P P P P Memory
Cluster architecture Processor Memory Processor Memory Interconnection network
Why not a cluster • Single SMP system easier to purchase/maintain • Ease of programming in SMP systems
Why a cluster • Scalability • Total available physical RAM • Reduced cost • But …
Having an application which exploits the parallel capabilities requires Studying the application or applications which will run on the cluster
Other requirements • Space, power ,cooling constraints , strength of floors • Software configuration: • Operating system • Compilers & application deve. tools • Load balancing and job scheduling • System management tools
Configuration M M M P P P P P P node2 node64 node1 Infiniband Switch
Before finalizing our choice … One should check , on a similar system : • Single processor peak performance • Infiniband interconnect performance • SMP behaviour • Non commercial parallel applications behaviour
Parallel applications issues • Execution time • Parallel speedup Sp= T1/Tp • Scalability
Benchmark design • Must give a good estimate of performance of your application • Acceptance test -should match all its components
What did work • Running MPI code interactively • Running a serial job through the queue • Compiling C code with MPI
What did not work • Compiling F90 or C++ code with MPI • Running MPI code through the queue • Queues do not do accounting per CPU
Parallel performance results Theoretical peak 2.1 Tflops Nanco performance on HPL: 0.58 Tflops
Conclusions from acceptance tests • New gcc (gcc4) is faster than Pathscale for some applications • MPI collective communication functions are differently implemented in various MPI versions • Disk access times are crucial - use attached storage when possible
Scheduling decisions • Assessing priorities between user groups • Assessing parallel efficiency of different job types (MPI,serial ,OPenMP) /commercial software and designing special queues for them • Avoiding starvation by giving weight to the urgency parameter
Observations during production mode • Assessing user’s understanding of machine – support in writing scripts and efficient parallelization • Lack of visualization tools – writing of script to show current usage of cluster
Conclusion • Benchmark correct design is crucial to test capabilities of proposed architecture • Acceptance tests allow to negotiate with vendors and give insights on future choices • Only after several weeks and running of the cluster at full capacity can we make informed decisions on management of the cluster