210 likes | 243 Views
Linux clustering. Clustering fundamentals. High Performance Computing (HPC) has become easier, and two reasons are the adoption of open source software concepts and the introduction and refinement of clustering technology.
E N D
Linux clustering Clustering fundamentals
High Performance Computing (HPC) has become easier, and two reasons are the adoption of open source software concepts and the introduction and refinement of clustering technology. • Linux clustering is popular in many industries these days. With the advent of clustering technology and the growing acceptance of open source software, supercomputers can now be created for a fraction of the cost of traditional high-performance machines. • We will talk about the different types of clusters, uses of clusters, some fundamentals of HPC, the role of Linux, and the reasons for the growth of clustering technology
HPC hardware fallsintothreecategories: • Symmetric multiprocessors (SMP) • Vector processors • Clusters Clusters are the predominant type of HPC hardware these days; a cluster is a set of MPPs. A processor in a cluster is commonly referred to as a node and has its own CPU, memory, operating system, and I/O subsystem and is capable of communicating with other nodes. These days it is common to use a commodity workstation running Linux and other open source software as a node in a cluster SMP is a type of HPC architecture in which multiple processors share the same memory. SMPs are generally more expensive and less scalable than MPPs. In vector processors, the CPU is optimized to perform well with arrays or vectors but clusters have become far more popular in recent years.
Clustering The term "cluster" can take different meanings in different contexts. Just focus on three types of clusters: • Fail-over clusters • Load-balancing clusters • High-performance clusters
Fail-overclusters • The simplest fail-over cluster has two nodes: one stays active and the other stays on stand-by but constantly monitors the active one. In case the active node goes down, the stand-by node takes over, allowing a mission-critical system to continue functioning.
Load-balancingclusters • Load-balancing clusters are commonly used for busy Web sites where several nodes host the same site, and each new request for a Web page is dynamically routed to a node with a lower load.
High-performanceclusters • These clusters are used to run parallel programs for time-intensive computations and are of special interest to the scientific community. They commonly run simulations and other CPU-intensive programs that would take an inordinate amount of time to run on regular hardware.
a basiccluster Grid computing is a broad term that typically refers to a service-oriented architecture (SOA) with collaboration among loosely coupled systems. Cluster-based HPC is a special case of grid computing in which the nodes are tightly coupled. A successful, well-known project in grid computing is SETI@home, the Search for Extraterrestrial Intelligence program, which used the idle CPU cycles of a million home PCs via screen savers to analyze radio telescope data. A similar successful project is the Folding@Home project for protein-folding calculations
Commonuses of high-performanceclusters • Almost every industry needs fast processing power. • With the increasing availability of cheaper and faster computers, more companies are interested in reaping the technological benefits. • There is no upper boundary to the needs of computer processing power; even with the rapid increase in power, the demand is considerably more than what's available.
Commonuses of high-performanceclusters • Life sciencesresearch • Oilandgasexploration • Graphicsrendering arethecommonusingareas
How Linux andclustershavechanged HPC • Before cluster-based computing, the typical supercomputer was a vector processor that could typically cost over a million dollars due to the specialized hardware and software • With Linux and other freely available open source software components for clustering and improvements in commodity hardware, the situation now is quite different. You can build powerful clusters with a very small budget and keep adding extra nodes based on need.
How Linux andclustershavechanged HPC • The GNU/Linux operating system (Linux) has spurred the adoption of clusters on a large scale. Linux runs on a wide variety of hardware, and high-quality compilers and other software like parallel filesystems and MPI implementations are freely available for Linux. Also with Linux, users have the ability to customize the kernel for their workload. Linux is a recognized favorite platform for building HPC clusters.
Somefeatures of clustersare as follows • Clustersarebuiltusingcommodity hardware andcost a fraction of thevectorprocessors. Inmanycases, theprice is lowerbymorethan an order of magnitude. • Clustersuse a message-passingparadigmforcommunication, andprogramshaveto be explicitlycodedtomakeuse of distributed hardware. • Withclusters, you can addmorenodestotheclusterbased on need. • Opensource software componentsand Linux leadtolower software costs. • Clustershave a muchlowermaintenancecost (theytakeuplessspace, takelesspower, andneedlesscooling).
ParallelprogrammingandAmdahl'sLaw • Software and hardware go hand in hand when it comes to achieving high performance on a cluster. Programs must be written to explicitly take advantage of the underlying hardware, and existing non-parallel programs must be re-written if they are to perform well on a cluster.
ParallelprogrammingandAmdahl'sLaw The real hard work in writing a parallel program is to make N as large as possible. But there is an interesting twist to it. You normally attempt bigger problems on more powerful computers, and usually the proportion of the time spent on the sequential parts of the code decreases with increasing problem size (as you tend to modify the program and increase the parallelizable portion to optimize the available resources). Therefore, the value of N automatically becomes large. • A parallel program does many things at once. Just how many depends on the problem at hand. Suppose 1/N of the total time taken by a program is in a part that can not be parallelized, and the rest (1-1/N) is in the parallelizable part:
Approachestoparallelprogramming Distributed memory approach It is useful to think a master-slave model here: The master node divides the work between several slave nodes. Slave nodes work on their respective tasks. Slave nodes intercommunicate among themselves if they have to. Slave nodes return back to the master. The master node assembles the results, further distributes work, and so on. Shared memory approach In the shared-memory approach, memory is common to all processors (such as SMP). This approach does not suffer from the problems mentioned in the distributed-memory approach. Also, programming for such systems is easier since all the data is available to all processors and is not much different from sequential programming. The big issue with these systems is scalability: it is not easy to add extra processors
When a computenode in a clusterwantstoaccess a file in thisparallelfilesystem, it goesthroughthefollowingsteps: • It requests a file as usual, and the request goes to the underlying PVFS filesystem. • PVFS sends a request to the meta-data server (steps 1 and 2 in Figure 4), which informs the requesting node about the location of the file among the various I/O nodes. • Using this information, the compute node communicates directly with all the relevant I/O nodes to get all the pieces of the file (step 3).
Thank you Gökhan DEMİREĞEN