210 likes | 419 Views
High Performance Compute Cluster. Abdullah Al Owahid Graduate Student, ECE Auburn University. Topic Coverage. Cluster computer Cluster categories Auburn’s vSMP HPCC Software installed Accessing HPCC How to run simulations in HPCC Demo Performance Points of Contact. Cluster Computer.
E N D
High Performance Compute Cluster Abdullah Al Owahid Graduate Student, ECE Auburn University A. Al Owahid: ELEC 5200-001/6200-001
Topic Coverage • Cluster computer • Cluster categories • Auburn’s vSMP HPCC • Software installed • Accessing HPCC • How to run simulations in HPCC • Demo • Performance • Points of Contact A. Al Owahid: ELEC 5200-001/6200-001
Cluster Computer • Multi Processor-Distributed Network • A computer cluster is a group of linked computer • Works together closely thus in many respects forming a single computer • Connected to each other through fast local area networks A. Al Owahid: ELEC 5200-001/6200-001
Cluster Computer-Categories • High-availability (HA) clusters --They operate by having redundant nodes • Load-balancing clusters --Multiple computers are linked together to share computational workload • Compute clusters -- HPCC --Computational purposes --Cluster shares a dedicated network --Compute job uses one or few nodes, and needs little or no inter-node communication (grid computing) --Uses MPI or PVM (parallel virtual machine) A. Al Owahid: ELEC 5200-001/6200-001
HPCC A. Al Owahid: ELEC 5200-001/6200-001
Auburn’s vSMP HPCCSamuel Ginn College of Engineering Computational Cluster • Dell M1000E Blade Chassis Server Platform • 4 M1000E Blade Chassis Fat Nodes • 16 M610 half-height Intel dual socket Blade • 2CPU, Quad-core Nehalem 2.80 GHz processors • 24GB RAM, two 160GB SATA drives and • Single Operating System image (CentOS). A. Al Owahid: ELEC 5200-001/6200-001
Auburn’s vSMP HPCC (contd..) • Each M610 blade server is connected internally to the chassis via a Mellanox Quad Data Rate (QDR) InfiniBand switch 40Gb/s for creation of the ScaleMP vSMP • Each M1000E Fat Node is interconnected via 10 GbE Ethernet using M6220 blade switch stacking modules for parallel clustering using OpenMPI/MPICH2 • Each M1000E Fat Node also has independent 10GbE Ethernet connectivity to the Brocade Turboiron 24X Core LAN Switch • Each node with 128 cores @ 2.80 GHz Nehalem • Total of 512 cores @ 2.80 GHz, 1.536TB shared memory RAM, and 20.48TB RAW internal storage A. Al Owahid: ELEC 5200-001/6200-001
vSMP Scale MP • ScaleMP is the leader in virtualization for high-end computing • The innovative Versatile SMP (vSMP) architecture aggregates multiple x86 systems into a single virtual x86 system, delivering an industry-standard, high-end symmetric Multiprocessing (SMP) computer. • vSMP Foundation aggregates up to 16 x86 systems to create a single system with 4 to 32 processors (128 cores) and up to 4 TB of shared memory. A. Al Owahid: ELEC 5200-001/6200-001
vSMP HPCC Configuration Diagram A. Al Owahid: ELEC 5200-001/6200-001
Network Architecture A. Al Owahid: ELEC 5200-001/6200-001
Software installed • Matlab (/export/apps/MATLAB) -- Parallel distributed computing toolbox with 128 workers • Fluent (/export/apps/Fluent.Inc) – 512 parallel license • LS Dyna (/export/apps/ls-dyna) – 128 parallel license • Starccm+ (/export/apps/starccm)--128 Parallel license • MPICH2 – Argonne National Laboratory /opt/mpich2-1.2.1p1 /opt/mpich2 A. Al Owahid: ELEC 5200-001/6200-001
Accessing HPCC http://www.eng.auburn.edu/ens/hpcc/Access_information.html A. Al Owahid: ELEC 5200-001/6200-001
How to run simulations in HPCC • Save .rhosts file in your home directory • Save .mpd.conf file in home directory • Your H:\ drive is already mapped • Add rsa keys by ssh compute-i and then exit, i=1,2,3,4 • mkdirfolder_name • In your script file add a line • #PBS –d /home/au_user_id/folder name obtained by “pwd” • Make the script executable “chmod 744 s_file.sh” • Submit the script using qsub “./script_file.sh” A. Al Owahid: ELEC 5200-001/6200-001
Basic commands • showq • runjobjob_id • canceljobjob_id • pbsnodes –a • pbsnodes compute-1 • ssh compute-1 • ps –ef | grepany_process_you_want_to_see • pkillprocess_name • kill -9 your_ aberrant process_id • exit A. Al Owahid: ELEC 5200-001/6200-001
Demo Live demo (25 minute) • Accessing cluster • Setting all the path and home space • Making changes in script based on requirement • Submitting multiple jobs • Obtaining the data • Viewing load • Tracing the processes A. Al Owahid: ELEC 5200-001/6200-001
Performance A. Al Owahid: ELEC 5200-001/6200-001
Performance (contd..) Minimum Run time curve A. Al Owahid: ELEC 5200-001/6200-001
Performance (contd..) Maximum speedup curve N/[ βN(N-1)+1] A. Al Owahid: ELEC 5200-001/6200-001
Points of Contact • James ClarkInformation Technology Master Specialist Email: jclark@auburn.edu • Shannon PriceInformation Technology Master Specialist Email: pricesw@auburn.edu • Abdullah Al Owahid Email: azo0012@auburn.edu A. Al Owahid: ELEC 5200-001/6200-001
Thank You Question & Answer A. Al Owahid: ELEC 5200-001/6200-001
References • http://en.wikipedia.org/wiki/Computer_cluster • http://www.eng.auburn.edu/ens/hpcc/index.html A. Al Owahid: ELEC 5200-001/6200-001