160 likes | 267 Views
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum. Arend Dittmer Director Product Management HPC April,17 2009. Penguin Vision and Focus. Founded 1998 – One of HPC industry’s longest track records of success
E N D
Taking the Complexity out of Cluster ComputingVendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,17 2009
Penguin Vision and Focus Founded 1998 – One of HPC industry’s longest track records of success Donald Becker, CTO – Inventor of Beowulf architecture and primary contributor to Linux kernel Over 2500 Customers in Enterprise, Academia and Government Focus on integrated ‘turnkey’ HPC clusters
Software Cluster Management Applications and Workload Managers Compilers and Tools Hardware Servers GPU Accelerators Storage Interconnects Racks and PDU’s Penguin Solutions Delivered “Ready-to-Run” • Rack Integration • Software Integration • Scyld Clusterware • Schedulers • Development tools • Applications • Solution Testing • System level burn-in • Full cluster testing • 24x7 Support
Trends in Cluster Computing Cluster Management Software
Linux clusters deliver unmatched price/performance Linux clusters dominate the HPC Market (Market share >75%) however…Compute power delivered by many systems introduces complexity Configuration consistency Distributed applications Workload Management Scyld ClusterWare designed to make cluster management easy The HPC Cluster Management Challenge 5
Scyld ClusterWare Design Master node is the single point of control Compute nodes are attached 'stateless' memory and processor resources Scyld maintains consistency across the cluster Designed for Ease-of-Use and Manageability ‘Manage a Cluster like a Single System’ 6
Web Based Monitoring Framwork One web based interface to all HPC cluster components Integrates existing tools e.g. IPMI, Ganglia, TORQUE Customizable, extensible Framework Based on XML, Java script and ExtJS
Trends in Cluster Computing Hardware
Heterogeneous Computing: GPUs + CPUs Massive processing power introduces I/O challenge Getting data to and from the processing units can take as long as the processing itself Requires careful software design and deep understanding of algorithms and architecture of Processors (Cache effects, memory bandwidth) GPU accelerators Interconnects (Ethernet, IB, 10 Gigabit Ethernet), Storage (local disks, NFS, parallel file systems) 240 cores 4 cores 9
Application Case Study: ANSYS / Acceleware ANSYS Direct Sparse Solver (DSS) - Single System Mode Matrix Decomposition offloaded to NVIDIA Tesla C1060 GPU Accelerator ANSYS standard benchmark BM-7 – 500K-1750K DoF Overall speedup up to 3.7X for Single Precision runs, 2.7X for Double Precision 10
Sample of Penguin’s Advanced Compute Offering NVIDIA Tesla S1070 GPU Accelerator Four processors, 240 cores each Native double precision floating point support Supports Nvidia’s CUDA API • Relion Intel 1702 • 1U Chassis housing two independent x86 nodes • Two Xeon 5500 Series 'Nehalem' processors per node • Up to 96GB of RAM on each node • Niveus HTX Personal Supercomputer • Engineered to support Tesla coprocessors • 720 GPU cores
Thank You April,17 2009
Application Case Study: ANSYS / Acceleware ANSYS Direct Sparse Solver (DSS) - SMP/Single System Mode Acceleware Plug-In for ANSYS Matrix Decomposition offloaded to NVIDIA Tesla C1060 GPU Accelerator Benchmark ANSYS standard benchmark BM-7 – 500K-1750K Degrees if Freedom (DoF) Intel Xeon E5405 – Dual core runs Overall speedup up to 3.7X for Single Precision runs, 2.7X for Double Precision 13
Integrated Management Framework One web based interface to all HPC cluster components Follows Scyld ‘Ease-of-Use’ Philosophy Integrates existing tools e.g. IPMI, Ganglia, TORQUE
A Sample of our 2500+ Customers National Labs Aerospace/Defense Universities/Institutions Enterprise
Hardware Effects: Multicore-Multithreading Moore’s Law is doubling the number of transistors on an integrated circuit every 18 months However, clock speeds are not scaling Multicore and Multithreaded Programming is critical for continued software scalability Rather than reinvent the wheel,use existing frameworks and tools OpenMP MPI Threaded Building Blocks Atlas, FFTW, MKL, AMCL, etc. 16