Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute)

Nanco: a large HPC cluster for RBNI(Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Resources needed for applications arising from Nanotechnology • Large memory –Tbytes • High floating point computing speed –Tflops • High data throughput – state of the art …

SMP architecture P P P P Memory

Cluster architecture Processor Memory Processor Memory Interconnection network

Why not a cluster • Single SMP system easier to purchase/maintain • Ease of programming in SMP systems

Why a cluster • Scalability • Total available physical RAM • Reduced cost • But …

Having an application which exploits the parallel capabilities requires Studying the application or applications which will run on the cluster

Things to include in design

Our choices

Other requirements • Space, power ,cooling constraints , strength of floors • Software configuration: • Operating system • Compilers & application deve. tools • Load balancing and job scheduling • System management tools

Configuration M M M P P P P P P node2 node64 node1 Infiniband Switch

Before finalizing our choice … One should check , on a similar system : • Single processor peak performance • Infiniband interconnect performance • SMP behaviour • Non commercial parallel applications behaviour

Parallel applications issues • Execution time • Parallel speedup Sp= T1/Tp • Scalability

Benchmark design • Must give a good estimate of performance of your application • Acceptance test -should match all its components

Comparison of performance

Execution time of Monte-Carlo parallel code (MPI)

What did work • Running MPI code interactively • Running a serial job through the queue • Compiling C code with MPI

What did not work • Compiling F90 or C++ code with MPI • Running MPI code through the queue • Queues do not do accounting per CPU

Parallel performance results Theoretical peak 2.1 Tflops Nanco performance on HPL: 0.58 Tflops

Comparison with Sun Benchmark

Execution time –comparison of compilers

Performance with different optimizations

Conclusions from acceptance tests • New gcc (gcc4) is faster than Pathscale for some applications • MPI collective communication functions are differently implemented in various MPI versions • Disk access times are crucial - use attached storage when possible

Scheduling decisions • Assessing priorities between user groups • Assessing parallel efficiency of different job types (MPI,serial ,OPenMP) /commercial software and designing special queues for them • Avoiding starvation by giving weight to the urgency parameter

Observations during production mode • Assessing user’s understanding of machine – support in writing scripts and efficient parallelization • Lack of visualization tools – writing of script to show current usage of cluster

Utilization of cluster

Utilization of nanco sep08

Nanco jobs by type

Conclusion • Benchmark correct design is crucial to test capabilities of proposed architecture • Acceptance tests allow to negotiate with vendors and give insights on future choices • Only after several weeks and running of the cluster at full capacity can we make informed decisions on management of the cluster

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute)

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute)

Presentation Transcript

ENHANCING CLUSTER LABELING USING WIKIPEDIA

Cluster Computing

Nanomaterials

Design of Large Scale Log Analysis Studies A short tutorial…

Cluster validation

Cluster Computing

SUPPLIER CLUSTER for F 14

High Performance Cluster Computing Architectures and Systems

Nanotechnology and Just-in-Time Education Akhlesh Lakhtakia

1. How to grow old

ESO Large Program 165-L0263: Distances, Ages and Metal Abundances in Globular Cluster Dwarfs

Data Mining Cluster Analysis: Basic Concepts and Algorithms

High Performance Cluster Computing: Architectures and Systems

Willy Russell

Data Mining Cluster Analysis: Basic Concepts and Algorithms

Lecture 5,6 January 15, 2010