1 / 89

What is a Computer Grid?

What is a Computer Grid?. A Computer Grid is a grouping of computer resources (CPU, Disk, Memory, Peripherals, ect.) for use as a single, albeit large and powerful, virtual computer. “Distributed computing across virtualized resources” [1].

amato
Download Presentation

What is a Computer Grid?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What is a Computer Grid? • A Computer Grid is a grouping of computer resources (CPU, Disk, Memory, Peripherals, ect.) for use as a single, albeit large and powerful, virtual computer. • “Distributed computing across virtualized resources” [1]. • “Coordinates resources that are not subject to centralized control… using standard, open, general-purpose interfaces and protocols…to deliver non-trivial quality of service” [3].

  2. What is a Computer Grid? • The basic premise is to have the ability to leverage as much of the unused CPU cycles (and other computer resources of the grid) as possible in order to execute a computer program more quickly. • Years and Months  Days and Hours! • Application requirements for grid usage • Application must be capable of executing remotely on the distributed grid architecture. • Application must be able to be subdivided into smaller “jobs” to take advantage of parallel processing. • Required data must be available and without undo latency. May require replication of data sets across grid.

  3. Why Grid? • The term “Grid” refers to the electric power “grid”. • Virtually unlimited power. • Customer is abstracted from type and location of power source. • Nuclear, Coal, Solar, Wind… • Local power plant or remote generation facility. • Millstone 2, 3, or Niagara Falls. • Customer doesn’t know and doesn’t care! • Pays on a per hour usage rate. • $.16/Killowatt Hour - $1.00/CPU Hour. • Plug ‘n Play. Grid computing not quite there yet.

  4. Grid Topologies • Grids come in all sizes but are typically segregated into 4 basic configurations or sizes that span physical location as well as geo-political issues. • Cluster – Similar H/W and S/W on a local LAN • Intragrid – Dissimilar H/W and S/W on a local LAN (departments in same company share) • Extragrid – Two or more Intragrids spanning across LAN’s, typically within the same geopolitical environment (Corporate wide) • Intergrid - A world wide combination of multiple Extragrid’s • Possible dedicated H/W and standalone Mainframe and Super-computer systems. • Spans corporations as well as counties. • Internet or private network backbone.

  5. Computer Grid Topologies (Intergrid) Cluster Extragrid Intragrid

  6. Computer Grid Types • Computer Grids are divided into 3 types: • Computational (CPU) Grid • Most Common (and mature) of all grids. • Logical extension of Distributed Computing. • CPU Cycles. • Data Grid • Focuses on the data storage capacity as the main shared resource. • Manage massive data sets ranging in size from Mega (106) bytes to Peta (1015) bytes. • Network Grid • Focuses on the communication aspects of the available resources. • Provide fault tolerant high-performance communication services. • Each grid type requires some aspect of the other to be truly functional.

  7. Grid Benefits and Issues • Exploit under utilized resources • Business desktop PC’s utilized 5% [2]. • 2 – 3 GHz Dual/Quad CPU’s, 1(+) Gigabytes of memory, .5 to 1 Terabyte disk, Gigabyte Ethernet. • Servers also under utilized with even more performance and resources. • Performance and capacity continually growing. • This unutilized computing power can be exploited by a computer grid architecture. • More efficient use of under utilized H/W. • Create a super computer for the cost of software! • May require application rewrite. “Grid Ready” app. • Remote Grid “node” (computer) must meet any special H/W, S/W, or resource requirements of the executing App.

  8. Benefits and Issues – Parallel CPU Capacity • Computer Grid offers the potential for massive parallel processing. • To truly exploit a grid the application must be subdivided into multiple sub-jobs for parallel processing. • Not practical or workable for many applications. • Currently no practical tool exists that can transform an arbitrary app into sub-jobs to take advantage of parallel processing. • Applications that can be subdivided will experience huge performance gains!

  9. Benefits and Issues – Virtual Resources • Virtualization of Resources • Fundamental point of Grid • Physical characteristics are abstracted. • Underlying H/W and S/W is transparent to the Grid user. • User “sees” one large and powerful computer system. • User can focus on the Task not the computer system.

  10. Benefits and Issues Access to Additional Resources • Each Computer (Node) of the Grid adds its resources to the entire grid. • CPU, Memory, Disk, and N/W. • Software Licenses. • Specialized Peripherals • Remote controlled Electron Microscope • Sensors • May require reservation system to guarantee availability.

  11. Benefits and Issues Resource Balancing and Reliability • Grid system maintains metadata about resources • Availability of node • Available resources on a particular node • Average throughput/performance • Failure detection/long executing jobs • System will direct a sub-job to an available node that can support the performance required. • If a node is busy, redirect to a different node • System can detect failed nodes and sub-jobs • Restart a job on same or different node • Resubmit job to a different node.

  12. Benefits and IssuesManagement and Virtual Organizations • Grid system can manage priorities among different projects and jobs • Requires cooperation among grid uses. • Virtual Organizations (VO) • Political entity. • Formed by users, groups, teams, companies, countries… • Form a collaboration among users to achieve a common goal. • Defines\Provides protocols and mechanisms for access to resources. • Can be stand-alone or a hierarchy of regional, national, or international VO’s.

  13. Benefits and Issues - Security • Important issue made even more important in a Grid architecture. • Application and Data is now exposed to multiple computers (nodes) any of which may be directly communicating with or executing your application. • Addressed with Authentication, Authorization, and Encryption. • Each node must be authenticated by the “grid” it belongs to. • Once authenticated, authorization can be given to specific nodes to allow it to perform certain tasks. • Encryption required for communication intercept issues. • Use technologies such as Key Encryption, Certificate Authority (CA), Digital Certificates, and SSL.

  14. Software Components • Grid system requires a layer of software to manage the grid (middleware). • Management tasks • Scheduling of jobs. • Resource availability monitoring. • Node capacity and utilization information gathering. • Job status for recovery. • Local node S/W • Needed to allow node to accept a sub-job for execution. • Allow it to register its resources to the grid. • Monitor job progress and send status to grid.

  15. Software Components – Job Scheduler • Major component of Grid “Middleware” • Can vary in complexity • Blindly submit jobs round-robin. • Job queuing system with several priority queues. • Advanced features include: • Maintain metadata for each node • Performance • Resources • Availability (idle/busy) • Status (On-line/Off-line) • Automatically find the most appropriate node for the next job in queue. • Job monitoring for recovery

  16. Software Components – Node Software • Grid systems can have 2 different node types • Resource only – No job submission. • Participating node – Can submit a job as well. • Every node of a grid system requires interface S/W regardless of type • All Nodes require… • Monitoring software that notifies the grid middleware\scheduler about… • Node availability • Current load • Available resources • Status of grid management software

  17. Software Components – Node Software • All nodes require (continued)… • Software that allows the node to accept and execute a job • Node S/W must accept the executable file or select the appropriate file from a local copy. • Locate any required dataset whether local or remotely located. • Communicate job status during execution. • Return results once completed. • Allow for communication between sub-jobs whether local to that node or not. • Dynamically adjust priorities of a job to meet a “level of service” requirement of others. • Participating grid node has additional requirements • Allow jobs to be submitted to the grid scheduler • May have its own scheduler or an interface to the grids common scheduler.

  18. Globus Toolkit • Framework for creating a Grid solution. • Created by the Globus Alliance (http://www.ggf.org ). • 80% of all computer grid systems are implemented using a version of the Globus Toolkit [2] • Current release is Version 4.0.6 – GT4 • Version 3 introduced SOA to the framework. • Version 4 expanded SOA and leverages Web Service (WS) as the underlying technology.

  19. Globus Toolkit • Globus Grid Forum (GGF) • Created Open Grid Service Architecture (OGSA) • Utilizes SOA for Grid implementation. • Two OGSA-compliant Grid Service Implementations based on Web Service (WS) architecture • Open Grid Service Interface (OGSI) • Web Service Resource Framework (WSRF) • WSRF is the latest and most true to the WS architecture • Utilizes standard XML schemas • Provides distinction between the service and the state of the service which is required for GS. • Defines a WS-Resource which includes “State” data using WS’s. • Maintained in an XML document. • Defines life cycle. • Known to and accessed by one or more WS’s.

  20. Globus Toolkit - Components • Components of the GT4 are segregated into 5 categories • Common Runtime • Security • Data Management • Monitor and Discovery • Execution • Not all components pieces are implemented as SOA. • Not all component pieces are full operational code. • Partial functional - starting point for a full implementation.

  21. Globus Toolkit - Components

  22. Globus Toolkit - Components • Common Runtime Components • “Building Blocks” for most toolkit components • Web Services implemented in 3 languages: • Java • C • Python (PyGridware) • All 3 consist of API’s and tools that implement the WSRF and WS-Notification standards. • Act as base components for various default services. • Java WS Core provides the development base library and tools for custom WSRF services.

  23. Globus Toolkit - Components • eXtensible IO (XIO) • Extensible I/O library written in C • Provides single API • Supports multiple protocols • Implementation encapsulated as drivers • Framework for error handling • Asynchronous message delivery • Timeouts • Driver approach • Supports concept of driver stacks • Maximizes code reuse • Written as atomic units and “stacked” on top of one another.

  24. Globus Toolkit - Components • Security Components • Implemented using Grid Security Infrastructure (GSI) • Utilizes public key cryptography as basis • Three primary functions of GSI • Provide secure authentication and confidentiality between elements of the grid. • Provide support for security across organizational boundaries i.e. no centrally-managed security system. • Supports “single-sign-on” for grid users

  25. Globus Toolkit - Components • Security Components (continued) - Authentication and Authorization • Enabled with message-level and transport-level security for SOAP communication of WS. • Also provides an Authorization framework for container-level authorization. • Community Authorization Service (CAS) • Provides access control to VO’s. • Grants fine-grain permissions of subsets of resources to VO members. • Extensible to multiple services • Currently supported by the GridFTP service

  26. Globus Toolkit - Components • Security Components (continued) - Delegation Services • Allows a single delegate credential to be used by many services. • Also supports credential renewal interface. • Capable of extending a credentials valid date • SimpleCA • Simplified Certificate Authority • Uses pre-WS Authentication, Authorization and OpenSSL. • Fully functional Public Key Infrastructure (PKI) • Suggest to be used for testing only – commercial CA solution should be utilized for production

  27. Globus Toolkit - Components • Security Components (continued) – MyProxy • Online credential repository. • Stores X.509 proxy credentials protected by pass phrase • Eliminates need for manual copying of private keys and cert files between nodes • Used for authentication to grid portals and credential renewal with job managers • GSI-OpenSSH • Modified version of OpenSSH with added support for GSI authentication • Permits file transfer between systems without user ID and PW prompting.

  28. Globus Toolkit - Components • Data Management Components • Set of tools concerned with location, transfer, and management of distributed data. • 2 basic categories- • Data Movement • Data Replication • Data Movement – GridFTP • Provides secure reliable data transfer between nodes • Based of FTP standard with additional Grid features • Added 3rd party transfer. • Data Movement – Reliable File Transfer (RFT) • Provides WS interface for transfer and deletion of files. • Receives requests via SOAP over HTTP and uses GridFTP to perform the actual work. • Utilizes a db to store list of files and their “state” for recovery if interrupted.

  29. Globus Toolkit - Components • Data Management Components • Data Replication - Replica Location Service (RLS) • Maintains access information about location of replicated data. • Can map multiple physical replicas to ne single logical file, enabling data redundancy on a grid. • Data Replication – OGSA-DAI • Open Grid Service Architecture – Data Access & Integration. • General grid interface for access data resources via WS • Databases and XML repositories • Supports query languages • SQL, XPath, and XQuery

  30. Globus Toolkit - Components • Data Management Components • Data Replication Services (DRS) • WSRF compliant WS • Exposes WS_Resource (Replicator resource) • Allows users to query the resource properties to monitor the “state” of the resource. • Supports locating of file sets and creating local replicas • GridFTP for file transfer • New replicas are registered in the Replication Location Service (RLS)

  31. Globus Toolkit - Components • Monitor and Discovery System (MDS) • Suite of WS concerned with collection, distribution, indexing, archival, and processing of grid resource availability and “state”. • MDS4 • WSRF and WS_Notification compliant version in GT4 • Aggregator Framework • Framework for building services that collect and aggregate data. (Aggregator Services) • Collects data from 3 source type (Information Provider) • Query, Subscription, and Execution sources • Source data for Query and Subscription is a WSRF-compliant service. • Source data for execution is an executable program

  32. Globus Toolkit - Components • Monitor and Discovery System (MDS) • Aggregator Framework - Index service (IS) • Central component of MDS services of GT4. • Default instance exposed as a WSRF service. • Collects resource information from multiple sources • Publishes it in a repository for discovery • Repository queried using XPath. • VO can configure local index service to track relevant sources in their domain. • Key features • Configurable in a hierarchy – but no single global index exists with all information regarding all resources. • Information published is recent but not latest. • Existence does not guarantee availability. • Requires periodic refreshing.

  33. Globus Toolkit - Components • Monitor and Discovery System (MDS) • Aggregator Framework - Trigger Services • Collects and compares resource information against a set of conditions • Conditions are defined in a configuration file. • Conditions are specified as an XPath expression. • WebMDS • Web-based interface to WSRF resource properties. • Used as a user-friendly interface to the index service. • Uses standard resource property requests. • Displays results in several formats.

  34. Globus Toolkit - Components • Execution Management • Concerned with all aspects of remote computation • Initiation, Monitoring, Management, and Scheduling • Utilizes Grid Resource Allocation and Management (GRAM) • Typically deployed with Delegation and RFT services. • API’s implemented in C, Java, and Python • Execution Management - WS GRAM • Grid service for remote execution and management of jobs. • SOAP messaging for communication between clients (nodes) • WS GRAM submits job to local scheduler for execution • Collaborates with RFT service for staging any required files.

  35. Globus Toolkit - Components • Community Scheduler Framework 4 (CSF4) • WSRF-compliant tool for grids that have multiple job schedulers. • Provides intelligent, policy-based meta-scheduling facility. • Enables a single interface for different resource managers. • Globus Teleoperations Control Protocol (GTCP) • Service interface for telecontrol. • WSRF version of NEESgrid Teleoperations Control Protocol (NTCP) • Controls heterogeneous instrumentation. • High-res cameras, Electron microscopes, ect. • Dynamic Accounts • Allows Grid client to dynamically create, manage and delete user accounts on remote UNIX sites.

  36. BMI Examples of Grid Usage • TeraGrid • Largest non-military grid implementation in USA. • Network of super computers • 250 teraflops (trillion floating point operations/second) • 30 petabytes of secondary storage (disk) • 40 Gbps network backbone • High-Resolution visualization environment • Toolkit for grid computing • National Science Foundation (NSF) Terascale initiative • Create an infrastructure of unbound capacity and scope connecting Universities and organizations with the fastest cross-country backbone in existence.

  37. BMI Examples of Grid Usage • TeraGrid • Currently composed of 11 super computers across the USA. • Each site contributes resources and expertise to create the largest computer grid in USA. • Primary usage is to support scientific research • Medical field usage: • Brain imaging. • Drug interaction with cancer cells.

  38. BMI Examples of Grid Usage • TeraGrid – 11 Sites • Indiana University (IU) • “Big Red” - Big Red is a distributed shared-memory cluster, consisting of 768 IBM JS21 Blades, each with two dual-core PowerPC 970 MP processors, 8GB of memory, and a PCI-X Myrinet 2000 adapter for high-bandwidth, low-latency Message Passing Interface (MPI) applications. • Joint Institute for Computational Sciences (JICS) • University of Tennessee and ORNL • Future expansions are being planned that would add a 40-teraflops Cray XT3 system to the TeraGrid. • Additional plans to expand to a 170 teraflops Cray XT4 system which in turn will be upgraded to a 10,000+ compute socket Cray system of approximately 1 petaflop.

  39. BMI Examples of Grid Usage • TeraGrid – 11 Sites • Louisiana Optical Network Initiative (LONI) • “Queen Bee”, the core cluster of LONI, is a 50.7 Teraflops Peak Performance 668 node Dell PowerEdge 1950 cluster running the Red Hat Enterprise Linux 4 operating system. Each node contains two Quad Core Intel Xeon 2.33GHz 64-bit processors and 8 GB of memory. • The cluster is interconnected with 10 GB/sec Infniband and has 192 TB of storage in a Lustre file system. • Half of Queen Bee's computational cycles have been contributed to the TeraGrid community.

  40. BMI Examples of Grid Usage • TeraGrid – 11 Sites • Oak Ridge National Laboratory (ORNL) • More of a user than a provider. • Their users of neutron science facilities (the High Flux Isotope Reactor and the Spallation Neutron Source) will be able to access TeraGrid resources and services for their data storage, analysis, and simulation. • National Center for Supercomputing Applications (NCSA) • University of Illinois Urbana-Champaign • Provides 10 teraflops of capability computing through its IBM Linux cluster, which consists of 1,776 Itanium2 processors. • The NCSA also includes 600 terabytes of secondary storage and 2 petabytes of archival storage capacity.

  41. BMI Examples of Grid Usage • TeraGrid – 11 Sites • Pittsburgh Supercomputing Center (PSC) • Provides computational power via its 3,000-processor HP Alpha Server system, TCS-1, which offers 6 teraflops of capability coupled uniquely to a 21-node visualization system. It also provides a 128-processor, 512-gigabyte shared-memory HP Marvel system, a 150-terabyte disk cache, and a mass storage system with a capacity of 2.4 petabytes. • Purdue University • Provide 6 teraflops of computing capability • 400 terabytes of data storage capacity • Visualization resources, access to life science data sets, and a connection to the Purdue Terrestrial Observatory.

  42. BMI Examples of Grid Usage • TeraGrid – 11 Sites • San Diego Supercomputer Center (SDCS) • Leads the TeraGrid data and knowledge management effort. • Provides a data-intensive IBM Linux cluster based on Itanium processors, that reaches over 4 teraflops and 540 terabytes of network disk storage. • In addition, a portion of SDSC’s IBM 10-teraflops supercomputer is assigned to the TeraGrid. • An IBM HPSS archive currently stores a petabyte of data.

  43. BMI Examples of Grid Usage • TeraGrid – 11 Sites • Texas Advanced Computing Center (TACC) • Provides a 1024-processor Cray/Dell Xeon-based Linux cluster • A 128-processor Sun E25K Terascale visualization machine with 512 gigabytes of shared memory • Total of 6.75 teraflops of computing/visualization capacity. • Provides a 50 terabyte Sun storage area network. • Only half of the cycles produced by these resources are available to TeraGrid users.

  44. BMI Examples of Grid Usage • TeraGrid – 11 Sites • University of Chicago/Argonne National Laboratory (UC/ANL) • Provides users with high-resolution rendering and remote visualization capabilities via a 1-teraflop IBM Linux cluster with parallel visualization hardware. • National Center for Atmospheric Research (NCAR) • Located in Boulder, CO. • “Frost” - BlueGene/L computing system. The 2048-processor system brings 250 teraflops of computing capability and more than 30 petabytes of online and archival data storage to the TeraGrid.

  45. BMI Examples of Grid Usage • TeraGrid Applications • The Center for Imaging Science (CIS) at Johns Hopkins University has deployed a shape-based morphometric tools on the TeraGrid to support the Biomedical Informatics Research Network, a National Institute of Health initiative involving 15 universities and 22 research groups whose work centers on brain imaging of human neurological disorders and associated animal models. • University of Illinois, Urbana-Champaign has a project that uses massive parallelism on the TeraGrid for major advances in the understanding of membrane proteins. • Another project is also harnessing the TeraGrid to attack problems in the mechanisms of bioenergetic proteins, the recognition and regulation of DNA by proteins, the molecular basis of lipid metabolism, and the mechanical properties of cells.

  46. BMI Examples of Grid Usage • GridMol - Molecular modeling on a Computer Grid • Molecular visualization and modeling tool. • Study of geometry and properties of molecules. • GridMol Features Include… • Modifying bond lengths and angles. • Change dihedral angles. (the angle between two planes that are determined by three connected atoms) • Adding or deleting atoms. • Adding radicals. • Globus Toolkit based • Scheduling tool is non-GT middleware. • Coded in Java, Java 3D, C/C++, and OpenGL. • Standalone application or applet for browser. • Runs on the China National Grid (CNG) • Composed of 8 super computer sites across China

  47. HPC1 GridMol CNGrid HPC2 Modeling CNGrid Middle-ware Globus Toolkit HPC3 Job Submission HPC4 Visualization BMI Examples of Grid Usage • GridMol - Molecular modeling on a Computer Grid • GridMol System Overview

  48. BMI Examples of Grid Usage • GridMol - Molecular modeling on a Computer Grid • Overview figure points – • A job is submitted to the CNGrid Middleware which will execute the application on available High Performance Computer systems (HPC) based on performance requirements. • GridMol maintains a history of job descriptions to remember the jobs for future operations. • After the job is submitted users can query the status of the job to determine if it has been successfully submitted or has failed. • After a job has finished GridMol can be used to analyze the results using several different visualization tools. • GridMol complete abstracts the underlying grid infrastructure from the user. Users do not need to know how to submit a job to the grid or on which HPC(s) it will run, which allows the research to focus on the molecule modeling problem and not be bothered with issues related to using the computer grid system.

  49. BMI Examples of Grid Usage • GridMol - Molecular modeling on a Computer Grid • GridMol supports six different molecule display models. • Different display models highlight different aspects of a molecule, each having its unique advantages and disadvantages. • Line Model - Bonds are shown as lines while atoms are not displayed.

  50. BMI Examples of Grid Usage • GridMol - Molecular modeling on a Computer Grid • Different display models • Ball and Stick Model – All atoms are shown as spheres of different size and all bonds are shown as cylinders of different lengths.

More Related