Grid Computing

Grid Computing Grid Computing With MPI Over Multiple Clusters Presented by: Vasil Lalov James Murithi Project Supervisor: Dr. Hassan Rajaei Dept. of Computer Science Bowling Green State University Bowling Green, OH

Presentation Overview • Introduction • Clustering Concepts • Grid Computing Concepts • Our Contribution • Demonstration • Q/A Time

Parallel Programming Concepts Example of a typical computer program: Application Application Process Data Results

Parallel Programming Concepts An Example of a primitive parallel program: Application Application MasterProcess Data Processor Processor Data MasterProcess Results

Clustering Concepts Example of a small cluster: Head Node Network Switch Compute Nodes

Grid Computing Concepts Definition of a Grid: • 1998: “A computationalgrid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities” - Carl Kesselman and Ian Foster • Grid computing is an emerging computing model that provides the ability to perform higher throughput computing by taking advantage of many networked computers to model a virtual computer architecture that is able to distribute process execution across a parallel infrastructure - from Wikipedia Ian Foster

Categories of Grids • Computational grids (CPU scavenging) – monitor the network for idle resources and use these for high performance computing • Data grids – is a grid computing system that deals with data, the controlled sharing and management of large amounts of distributed data. • Equipment grids – have a primary piece of equipment like a telescope which the grid gets data from and analyses.

Grid Computing Concepts Key Elements of a Computational Grid: • Coordination of resources that are subject to decentralized control • Resources from different domains (VO, company, department)‏ • Users from different domains • Resources are often geographically separated • Use of standard, open general-purpose protocols and interfaces • Authentication/authorization • Resource discovery/access • Delivers non-trivial quality of service • Utility of combined system >> sum of parts

Grid Computing Concepts Grid Types: • Global Grid • Includes resources located in multiple countries around the world • Used for solving problems of global importance • Rarely used for time sensitive applications • National Grid – e.g. Terra Grid • Includes resources located with in the boundaries of a single country • Often used for governmental purposes • Mini Grid • Includes resources owned and managed by a single organization (company, university, etc.)‏ • Primarily used for research and education purposes • True commercial use is still in its infancy

Grid Computing Concepts Grid Organization of Resources: Cluster 2 VirtualOrganization Cluster 1 Data Warehouse

Grid Resource Managers Definition and Examples: • Definition - A software package that is responsible for: • Detecting and managing available resources on the grid • Collecting, distributing and managing jobs that use the grid resources • Providing a simple user interface for submitting jobs to the grid • Enforcing security policies for protecting resources, data and users on the grid • Popular Grid Resource Managers: • Globus ToolKit • Condor

MPI Compilers for Grid Computing

Grid Resource Managers Problems with Globus ToolKit: • Complex Installation and Configuration • To run parallel jobs, MPICH-G2 is required • Very difficult installation • Requires 2 IP addresses per compute node • Requires recompiling existing MPI based software • Current Source Code is broken • Runs on Java (slow, problematic)‏

Grid Resource Managers Condor Grid Manager: • Requires only MPI 1.2.x: • No need for second NIC card and external IP addresses • No need for recompiling existing MPI based software • Extremely versatile and scalable: • Can manage very small and very large grids • Manages multiple types of resources • Automatically finds, configures and uses resources • Works with many types of job schedulers (PBS, SGE, etc) • Easy to use once installed and configured • Standalone Application (faster) • Huge community support • Current version is 6.8.6

Grid Resource Managers Details Condor Grid Manager: • Condor Universes – a universe is a run time environment • Standard – The standard universe allows a job running under Condor to handle system calls by returning them to the machine where the job was submitted • Vanilla – provides a way to run jobs that cannot be relinked, these jobs cannot be relocated, for batch ready jobs • MPI – Obsolete universe • Parallel – Parallel jobs including MPI • What is Condor good/used for? • “Hunting” for available resources • Maximizing the Grid throughput • Background Jobs (BOINC)‏ • Interfacing with other job managers (Globus, SGE)‏

Demonstration • Grid Monitoring of Resources • Condor Job Submission Scripts • Condor Job Submission Process

Future Work • Improve on the current Condor Configuration on Protos Cluster • Research on interoperability of Globus and Condor • Install and configure Condor on BWP4 Cluster • Test the mini-grid • Scale up the current platform

In Conclusion • Grid computing is exponentially more complex than cluster computing • Grids are usually designed for wide range of applications • Execution of MPI jobs in Grid environment requires additional setup • Overall, Grids are more reliable than clusters but not as consistent

Q/A Time Thanks Questions?

Grid Computing

Grid Computing

Presentation Transcript

Grid Computing

Grid Computing

Grid Computing

Grid Computing

Grid Computing

Grid Computing

Grid Computing

Grid Computing

Grid Computing

Grid Computing

Grid Computing

Grid Computing

Grid Computing:

Grid Computing

Grid Computing

Grid Computing

Grid computing

Grid Computing

Grid Computing

Grid Computing

Grid Computing

Grid Computing