190 likes | 386 Views
COCOA. MAY 31, 2001 김경임, 박성호. Contents. Background COCOA Overview System Architecture Key Technologies Application Area Evaluation Conclusion References. Background. A Thesis in Aerospace engineering, Pennsylvania State Univ. by Anirudh Modi, 1999
E N D
COCOA MAY 31, 2001 김경임, 박성호
Contents • Background • COCOA Overview • System Architecture • Key Technologies • Application Area • Evaluation • Conclusion • References
Background • A Thesis in Aerospace engineering, Pennsylvania State Univ. by Anirudh Modi, 1999 • “Unsteady separated flow simulations using a cluster of workstations” • Need to a suitable platform for the efficiency & accuracy of PUMA(a parallel flow solver) • Resolving several steady solutions • A fully three-dimensional unsteady separated flow around a sphere • PUMA : the Parallel Unstructured Maritime Aerodynamics • Financial support : the Rotorcraft Center of Excellence(RCOE) at Penn State
COCOA Overview • The COst effective COmputing Array(COCOA) • A Beowulf cluster that have 50 processors • To bring low cost parallel computing • The whole system cost approximately $100,000 (1998 US dollars) • Performance • the benchmark shows that was almost twice as fast as the Penn State IBM SP (older RS/6000-370 nodes) supercomputer for this applications
System Architecture • Computing Node(26 WS-410 Dell W/S ) • Dual 400MHz Intel Pentium II Processors w/512K L2 Cache • 512MB SDRAM • 4GB UW-SCSI2 Disk • 3com 3c509B 100Mbits/sec Fast Ethernet Card • 32x SCSI CD-ROM Drive • 1.44MB FDD • Cables • In addition, • One Baynetworks 450T 24-way 100Mbits/sec Switch • Two 16-way Monitor/keyboard/mouse Switches • Four 500 kVa APC UPS • For one server : one monitor, keyboard, mouse and 54GB extra UW-SCSI2 HDD
System Architecture cont. • Setting up H/W
System Architecture cont. • Operating System • RedHat Linux 5.1 • Software • Base packages from RedHat Linux 5.1, Kernel#2.0.36 • Freeware GNU C/C++ compiler(gcc, pgcc) • Fortran77/90 compiler & Debugger by Portland Group • Freeware MPI libraries for parallel programming in C/C++/Fortran77/90 • ssh-1.2.26 forsecure access • DQS v3.0, a queueing system • Scientific Visualization Software TECPLOT from Amtec Corp.
Key Technologies • Beowulf Cluster • A system which usually consists of one server node, and one or more client nodes connected together via Ethernet or some other fast network • Developed for large scale computing, such as aerodynamics, atmosphere, physics, etc. • First Developed at 1994 in NASA • Low price supercomputing is possible • High performance/low price processors • High speed network devices available • Numerous Beowulf clusters developed • Used in various computational science fields
Key Technologies cont. • DQS (Distributed Queuing System) • Developed to experiment batch queuing system at the Super-computer Computations Research Institute, Florida State Univ. • Provide a single coherent allocation and management • MPI (Message Passing Interface) • Standard for parallel programming • SSH (Secure Shell) • Program for logging & executing commands into/on a remote machine • Provides secure encrypted communication inter-un-trusted hosts over an insecure network
Application Area • Analysis maritime aerodynamics • Analysis flows over complex configurations (like ships and helicopter fuselages) • Use PUMA • Details of problem:Helicopter can safely land on frigate in the North Sea only 10 percent of the time in winter
PUMA(Parallel Unstructured Maritime Aerodynamics) • Program for analysis of internal and external non-reacting compressible flows over arbitrarily complex 3D geometries • Written entirely in ANSI C using MPI library for message passing and hence highly portable giving good performance
PUMA(Parallel Unstructured Maritime Aerodynamics) cont. • Use domain decomposition • Domain decomposition • Distribute data across processes, and each process performingapproximately same operation on the data • Problem level parallelism, but loop level (not SIMD) • Minimize communications cost • Functional decomposition • Divides a problem into several distinct tasks that may be executed in parallel • Parallelization in PUMA • Each compute node read its own portion of the grid file at startup • Each compute node generate the flow solution over the given grid, parallelly
PUMA(Parallel Unstructured Maritime Aerodynamics) cont. • Modifications to PUMA • Modify PUMA to read several hundred lines at a time and broadcasting the combined data to every processor using a reasonably sized buffer • Modify MPI to combine several small messages into one before starting communication Mbits/sec vs Packet size on COCOAfor MPI_Send/Recv test
PUMA(Parallel Unstructured Maritime Aerodynamics) cont. Improvement in PUMA performance after combining several small MPI messages into one
Evaluation Total Mflops vs Number of Processors on COCOA for PUMA test case Speed-up vs Number of Processors on COCOA for PUMA test case
Evaluation cont. NAS Parallel Benchmark on COCOA:comparison with other machines for Class “C” LU test
Conclusion • Beowulf class supercomputer (PC, Linux, MPI, DQS, SSH) • Cost effective supercomputer for numerical simulations • Almost twice as fast compared to the Penn State IBM-SP supercomputer,for our production codes including PUMA, given the same number of processors, while being built at a fraction of the cost($100,000(1998 US dollars)). • Be suitable for only numerical simulation part (weather, fluid...) that doesn’t have high communication to computation ratios, because of the high communication latency. • Good scalability with most of the MPI applications used • The Object, to build Cost effective supercomputer for numerical simulations dealt with at Penn State has been fulfilled.
References • COCOA : http://cocoa.ihpca.psu.edu • NAS Parallel Benchmarks : http://science.nas.nasa.gov/Software/NPB • Beowulf : http://www.beowulf.org • RedHat : http://www.redhat.com • MPI : http://www.mcs.anl.gov/mpi • DQS : http://www.scri.fsu.edu/~pasko/dqs.html • Tons of references…