1 / 20

Overview

Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC. Overview. What is Cluster Computing Why Cluster Computing How Clusters Fit with OSC Mission When Did It All Start OSC 128 Processor SGI/Linux Cluster

joann
Download Presentation

Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cluster Computing OverviewSummer Institute for Advanced ComputingAugust 22, 2000Doug Johnson, OSC

  2. Overview • What is Cluster Computing • Why Cluster Computing • How Clusters Fit with OSC Mission • When Did It All Start • OSC 128 Processor SGI/Linux Cluster • Clusters for Production HPC Environments Cluster Computing Overview

  3. What is Cluster Computing? Common Resources CPU(s)MemoryHard DriveNetwork Card A Cluster is a collection of interconnected whole computers used as a single, unified computer Cluster Computing is many things... • High performance computing • Run programs with parallel algorithms • High throughput computing • Parametric studies (same program run many times with different parameters) • High availability computing • Fail-over redundancy Both scientific and commercial applications! NETWORK Cluster Computing Overview

  4. Brief History of Cluster Computing at OSC OSC SGI/Linux 128 Processor Cluster, Pentium III Xeon 550MHz processors, 66 Gbyte RAM, Myrinet and 100Mbit Ethernet interconnect OSC 10 processor IA32 Linux Cluster, Pentium II-400MHz processors,Myrinet interconnect, 4.5 Gbyte RAM OSC installs “Trout” system, dual purpose workstation cluster, 14 SGI O2 workstations, R10000 processors @ 150 MHz, ATM interconnect OSC installs “Beaker” systems, a dual purpose workstation cluster - 12 DEC Alpha EV4 processors with Full Duplex FDDI interconnect Beowulf project at Center of Excellence in Space Data and Information Sciences (CESDIS) installs 1st cluster - 16 Intel 486 DX4 processors @ 100MHz, 16 Mbytes RAM per processor, 10 Mbit Ethernet interconnect (3per node) Cluster Computing Overview

  5. Why Parallel Computing OSC Mission Statement OSC provides a reliable high performance computing and communications infrastructure for a diverse, statewide/regional community including education, academic research, industry, and state government. … • Parallel computing is a strong presence at the National level and is the future of High Performance Computing(HPC) • Parallel computing platforms are a vital element in our infrastructure • Parallel systems have traditionally not been an accessible resource, compared to single processor systems • Higher cost (due mostly to high performance interconnect) • Less refined user interface • Non-traditional programming techniques with little training available Cluster Computing Overview

  6. Why Cluster Computing OSC Mission Statement ... In collaboration with this community, OSC evaluates, implements, and supports new and emerging information technologies. ... OSC evaluates new and emerging information technologies • Cluster computing is one of the hottest fields in high performance computing Potential benefits of clusters over traditional parallel systems • High performance interconnect technology is approaching commodity availability • Performance of commodity systems are increasing at an aggressive rate due to the commercial market of home/office workstations Cluster Computing Overview

  7. Why Cluster Computing Potential benefits of clusters over traditional parallel systems (cont) • Operating system gives users the same environment on their desk that they have on the parallel system Other differences • System administration implications • No single system image - OS and software upgrades must be applied to all nodes • Cluster design lends itself to more frequent hardware upgrades • Performance implications • Accounting/funding implications Cluster Computing Overview

  8. How Clusters Fit With OSC Mission OSC Mission Statement ... In collaboration with this community, OSC evaluates, implements, and supports new and emerging information technologies. ... • OSC evaluates new and emerging information technologies • Multiple software packages have been evaluated to provide the most robust system • Four different network interconnects have been installed to evaluate performance • Three different processors and operating systems were investigated • OSC implements new and emerging information technologies • A cluster under OSC administration has been available to users since March, 1999 • OSC Partnered with Portland Group to bring Cluster Development Kit to OSC users • OSC supports new and emerging information technologies • OSC 128 processor cluster in production status • Training classes on how to build and use a cluster • Staff available to Ohio faculty to help answer questions and trouble shoot problems Cluster Computing Overview

  9. To Summarize • Develop cluster technology so that it can be rolled out to university research labs • Provide a hardware and software configuration that will allow labs to construct a working cluster with minimal effort • Experienced OSC staff can provide technical assistance • Evaluate software and hardware configurations to assist researchers in defining a system that will best suit their needs • Let the researchers focus on science • Based on user applications, provide performance analysis showing the optimal hardware and software configuration • OSC wants to encourage parallel programming • Parallel programming is the future of high performance computing • Clusters provide increased access to parallel systems Cluster Computing Overview

  10. When Did It All Start? December, 1998OSC management authorizes a dedicated 10 processor cluster for technology evaluation April, 1999Performance evaluation yields promising results and machine is opened to users in April, 1999 1 - Front end node 2 Intel Pentium II 400MHz processors 512 Mbyte RAM, 18 Gbyte Disk 4 - Compute nodes 2 Intel Pentium II 400MHz processors 1 Gbyte RAM, 9 Gbyte disk Interconnects: 100mbit Ethernet, Dolphinics SCI, Myricom Myrinet Linux OS, PBS Batch System, PGI Compiler Suite Cluster Computing Overview

  11. OSC/SGI Cluster September, 1999Agreement signed between OSC and SGI October, 1999System powered on November, 1999Machine configured and running applications on floor of Supercomputing 99 December, 1999Machine installed at OSC February, 2000Machine opened to friendly users Cluster Computing Overview

  12. Hardware • 1 front-end node configured with: • Two Gigabytes of RAM • Four 550 MHz Intel Pentium III Xeon processors, each with 512kB of secondary cache • 48 Gigabytes, ultra-wide SCSI hard drives • Two 100Base-T Ethernet interfaces • One HIPPI interface • 32 compute nodes each configured with: • Two Gigabytes of RAM • Four 550 MHz Intel Pentium III Xeon processors, each with 512kB of secondary cache • 18 Gigabytes, ultra-wide SCSI hard drives • Two Myrinet interfaces • One 100Base-T Ethernet interface All nodes are SGI 1400L servers Cluster Computing Overview

  13. Software and Configuration • Hardware originally assembled in Mountainview, CA by SGI Professional Services • OS and software environment installed and configured by OSC staff • Linux operating system • Portable Batch System (PBS) • Portland Group Compiler Suite • Myrinet MPICH-GM interface Cluster Computing Overview

  14. Clusters for Production HPC Environment There are two significant efforts with building clusters • Building a cluster and making it operational • Making the cluster a production system • Ability to host multiple users simultaneously • Ability to schedule system resources • Ability to function without constant intervention The OSC cluster has the following attributes that make it a true HPC production system • Connection to a Mass Storage System (MSS) • Integrated into OSC account database system • Job accounting • Good utilization • High availability Cluster Computing Overview

  15. Mass Storage Support HIPPI DMF 100 Mbit Switch 100 Mbit (private) Origin 2000 1 Terabyte disk storageData Migration Facility (DMF) IBM 3494 30 Terabyte tape storage . . . . Cluster Computing Overview

  16. User Accounts and Accounting • User Accounts • Cluster is integrated into the Center’s database system for automatic account generation and maintenance • Job Accounting • Accounting has been configured into the environment which tracks CPU usage of users • CPU usage is converted with a charging algorithm and deducted from a Principal Investigators account • Users can view accounting history with text command from Linux command prompt Cluster Computing Overview

  17. Utilization and Availability • Utilization • System utilization is recorded and accessible via a web link • For parallel systems, utilization is expected to be around 50 to 70% • Current utilization is about 70% parallel and 30% serial • Availability • Good availability has been achieved through significant uptime and minimal system problems • Scheduling downtime every 4 weeks for software upgrades, hardware modifications and general system maintenance Cluster Computing Overview

  18. TCP Stream Performance Cluster Computing Overview

  19. TCP Stream Performance Cluster Computing Overview

  20. UDP Stream Performance ./netperf -l 60 -H fe.ovl.osc.edu -i 10,2 -I 99,10 -t UDP_STREAM -- -m 1472 -s 32768 -S 32768 UDP UNIDIRECTIONAL SEND TEST to fe.ovl.osc.edu : +/-5.0% @ 99% conf. Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 131070 1472 59.99 3229909 0 634.03 524288 59.99 2169706 425.91 Cluster Computing Overview

More Related