190 likes | 361 Views
Grid Computing. Grid Computing With MPI Over Multiple Clusters. Presented by: Vasil Lalov James Murithi. Project Supervisor: Dr. Hassan Rajaei Dept. of Computer Science Bowling Green State University Bowling Green, OH. Presentation Overview. Introduction Clustering Concepts
E N D
Grid Computing Grid Computing With MPI Over Multiple Clusters Presented by: Vasil Lalov James Murithi Project Supervisor: Dr. Hassan Rajaei Dept. of Computer Science Bowling Green State University Bowling Green, OH
Presentation Overview • Introduction • Clustering Concepts • Grid Computing Concepts • Our Contribution • Demonstration • Q/A Time
Parallel Programming Concepts Example of a typical computer program: Application Application Process Data Results
Parallel Programming Concepts An Example of a primitive parallel program: Application Application MasterProcess Data Processor Processor Data MasterProcess Results
Clustering Concepts Example of a small cluster: Head Node Network Switch Compute Nodes
Grid Computing Concepts Definition of a Grid: • 1998: “A computationalgrid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities” - Carl Kesselman and Ian Foster • Grid computing is an emerging computing model that provides the ability to perform higher throughput computing by taking advantage of many networked computers to model a virtual computer architecture that is able to distribute process execution across a parallel infrastructure - from Wikipedia Ian Foster
Categories of Grids • Computational grids (CPU scavenging) – monitor the network for idle resources and use these for high performance computing • Data grids – is a grid computing system that deals with data, the controlled sharing and management of large amounts of distributed data. • Equipment grids – have a primary piece of equipment like a telescope which the grid gets data from and analyses.
Grid Computing Concepts Key Elements of a Computational Grid: • Coordination of resources that are subject to decentralized control • Resources from different domains (VO, company, department) • Users from different domains • Resources are often geographically separated • Use of standard, open general-purpose protocols and interfaces • Authentication/authorization • Resource discovery/access • Delivers non-trivial quality of service • Utility of combined system >> sum of parts
Grid Computing Concepts Grid Types: • Global Grid • Includes resources located in multiple countries around the world • Used for solving problems of global importance • Rarely used for time sensitive applications • National Grid – e.g. Terra Grid • Includes resources located with in the boundaries of a single country • Often used for governmental purposes • Mini Grid • Includes resources owned and managed by a single organization (company, university, etc.) • Primarily used for research and education purposes • True commercial use is still in its infancy
Grid Computing Concepts Grid Organization of Resources: Cluster 2 VirtualOrganization Cluster 1 Data Warehouse
Grid Resource Managers Definition and Examples: • Definition - A software package that is responsible for: • Detecting and managing available resources on the grid • Collecting, distributing and managing jobs that use the grid resources • Providing a simple user interface for submitting jobs to the grid • Enforcing security policies for protecting resources, data and users on the grid • Popular Grid Resource Managers: • Globus ToolKit • Condor
Grid Resource Managers Problems with Globus ToolKit: • Complex Installation and Configuration • To run parallel jobs, MPICH-G2 is required • Very difficult installation • Requires 2 IP addresses per compute node • Requires recompiling existing MPI based software • Current Source Code is broken • Runs on Java (slow, problematic)
Grid Resource Managers Condor Grid Manager: • Requires only MPI 1.2.x: • No need for second NIC card and external IP addresses • No need for recompiling existing MPI based software • Extremely versatile and scalable: • Can manage very small and very large grids • Manages multiple types of resources • Automatically finds, configures and uses resources • Works with many types of job schedulers (PBS, SGE, etc) • Easy to use once installed and configured • Standalone Application (faster) • Huge community support • Current version is 6.8.6
Grid Resource Managers Details Condor Grid Manager: • Condor Universes – a universe is a run time environment • Standard – The standard universe allows a job running under Condor to handle system calls by returning them to the machine where the job was submitted • Vanilla – provides a way to run jobs that cannot be relinked, these jobs cannot be relocated, for batch ready jobs • MPI – Obsolete universe • Parallel – Parallel jobs including MPI • What is Condor good/used for? • “Hunting” for available resources • Maximizing the Grid throughput • Background Jobs (BOINC) • Interfacing with other job managers (Globus, SGE)
Demonstration • Grid Monitoring of Resources • Condor Job Submission Scripts • Condor Job Submission Process
Future Work • Improve on the current Condor Configuration on Protos Cluster • Research on interoperability of Globus and Condor • Install and configure Condor on BWP4 Cluster • Test the mini-grid • Scale up the current platform
In Conclusion • Grid computing is exponentially more complex than cluster computing • Grids are usually designed for wide range of applications • Execution of MPI jobs in Grid environment requires additional setup • Overall, Grids are more reliable than clusters but not as consistent
Q/A Time Thanks Questions?