480 likes | 725 Views
Grid Computing. Hakan ÜNLÜ CMPE 511 Presentation Fall 2004. Overview. General Introduction to Grid Computing Introduction: Why Grids? Applications for Grids Basic Grid Architecture Grid Platforms & Standarts Issues in Grid Computing Hardware: Blade Computers
E N D
Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004
Overview • General Introduction to Grid Computing • Introduction: Why Grids? • Applications for Grids • Basic Grid Architecture • Grid Platforms & Standarts • Issues in Grid Computing • Hardware: Blade Computers • System Management : Globus Toolkit • Software: Scheduling
What is Grid Computing? • Computational and Networking Infrastructure that is designed to provide pervasive, uniform and reliable access to data, computational and human resources distributed over wide area environments
Grids Are By Definition Heterogeneous • It’s about legacy resources, infrastructure, applications, policies, and procedures • The grid and its administrators must integrate in stealth mode…with • Firewalls • Filesystems • Queuing systems • Grumpy systems administrators • Tried and true applications
Challenges in Grid Computing • Reliable performance • Trust relationships between multiple security domains • Deployment and maintenance of grid middleware across hundreds or thousands of nodes • Access to data across WAN’s • Access to state information of remote processes • Workflow / dependency management • Distributed software and license management • Accounting and billing
Applications for a Grid • Generally, apps that work well on clusters can work well on grids • Non-interactive / batch jobs • Parallel computations with minimal interprocess communication and workflow dependencies • Reasonable data transfer requirements • Sensible economics • Productivity Gains > Cost of Building Grid + Opportunity Costs of Resources
Non-Interactive / Batch Jobs • Difficult to get a real-time UI for jobs running on the grid • A possible interactive application: spreadsheet computation • Want to take advantage of off-peak free cycles • Jobs run for several days, weeks or months • The user might prefer to be sleeping while the job runs! • Running processes might need to be interrupted or re-prioritized based on the current load on a grid compute engine • Idle thread / “screensaver” computing
Parallel Computations • Application needs to be able to run as multiple, mostly independent pieces • Can’t depend on the network’s Quality of Service • Can’t rely upon the order of execution and completion • Apps that need these things are better suited for tightly coupled compute platforms (e.g. SMP systems) • Grid can still be useful as a meta-scheduler and data source for such apps • e.g. the user submits the job to the grid queue and asks for the best available SMP resource
Costs: Grid Middleware Architects and Developers User Training Infrastructure Hardware Opportunity Costs Would a big SMP box return better results for your problem? Benefits: Better Utilization of Existing Capital Resources More Efficient Users Ability to complete more work in the same amount of time Performance near or sometimes as good as the big SMP box Some Costs and Benefits
Basic Grid Architecture • Clusters and how grids are different than clusters • Departmental Grid Model • Enterprise Grid Model • Global Grid Model
What Makes a Cluster a Cluster? • Uses a Distributed Resource Manager (DRM) to manager job scheduling • Tightly coupled - High speed, low latency interconnect network • Fairly homogenous - Configuration management is important! • Single administrative domain
High Speed Interconnect The Cluster Model Master Node User Interface/API 3A RD PM MP DM Cluster DRM Configuration Management Shared Storage Cluster DRM Cluster DRM Cluster DRM Cluster DRM 3A RD PM MP DM 3A RD PM MP DM 3A RD PM MP DM 3A RD PM MP DM Operating System Operating System Operating System Operating System Storage Compute Storage Compute Storage Compute Storage Compute Cluster Node Cluster Node Cluster Node Cluster Node
How is an Enterprise Grid Different from a Cluster? • Heterogeneous - Clusters, SMP, even workstations of dissimilar configurations, but all are tied together through a grid middleware layer • Lightly coupled - Connected via 100 or 1000Mbps Ethernet • Introduces a resource registry and grid security service • But usually only a single registry and security service for the grid • Not necessarily a single administrative domain
Enterprise LAN or WAN Cluster Interface Cluster Interface Cluster Interface Cluster Interface Cluster Interface Cluster Interface AA AA AA AA AA AA RD RD RD RD RD RD PM PM PM PM PM PM MP MP MP MP MP MP DM DM DM DM DM DM Operating System Operating System Operating System Operating System Operating System Operating System Storage Storage Storage Storage Storage Storage Compute Compute Compute Compute Compute Compute The Enterprise Grid Model User Interface/API 3A RD PM MP DM Grid Interface Resource Registry Security Infrastructure Grid Interface Grid Interface Grid Interface Grid Interface 3A RD PM MP DM 3A RD PM MP DM 3A RD PM MP DM 3A RD PM MP DM Cluster DRM Cluster DRM Operating System Operating System Storage Compute Storage Compute SMP SMP
How is a Global Grid Different from an Enterprise Grid? • "Grid of Grids" - Collection of enterprise grids • Loosely coupled between sites - Not much control over Quality of Service • Mutually distrustful administrative domains • Multiple grid resource registries and grid security services
WAN LAN LAN LAN The Global Grid Model Site B SMP Cluster Cluster Cluster Site A Grid Grid Grid Grid UI/API Grid RR SI RR SI UI/API Grid Site C Grid Grid Grid Grid SMP SMP Cluster Cluster RR SI UI/API Grid Grid Grid Grid Grid SMP SMP SMP Cluster
Grid Platforms & Standards • The Global Grid Forum • http://www.gridforum.org/ • Globus Toolkit • DCML (Data Center Markup Language)
Globus Toolkit V2 “Pillars” Resource Management (GRAM) Information Services (MDS) Data Management (GASS) Grid Security Infrastructure (GSI)
Globus Toolkit V2 Stack GRAM MDS GASS/GridFTP HTTP LDAP FTP GSI TLS/SSL TCP/IP
Globus Toolkit V2 Key Components:GRAM, MDS and GASS • Grid Resource Allocation Manager (GRAM) • Server-side: “gatekeeper” process that controls execution of job managers • Client-side: “globusrun” UI to launch jobs • Monitoring and Directory Service (MDS) • GRIS: Grid Resource Information Service collects local info • GIIS: Grid Index Information Service collects GRIS info • Global Access to Secondary Storage (GASS) • GridFTP, implemented through “in.ftpd” daemon and “globus-url-copy” command • Files accessed through a URI, e.g. gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt
Globus Toolkit V2 Additional Components • Grid Packaging Tools (GPT) • Used to build (“gpt-build”), install (“gpt-install”) and localize (“gpt-postinstall”) Globus components • MPICH-G2 • A Globus V2 enabled version of MPI (Message Passing Interface) • Based on MPICH • Utilizes GSI, MDS and GRAM
Network Grid Node Grid Node Grid Node Grid Node gatekeeper gatekeeper gatekeeper gatekeeper GRIS GRIS GRIS GRIS in.ftpd in.ftpd in.ftpd in.ftpd Globus Toolkit V2 Network Services Client Node GRAM Client GIIS Server Certificate Authority
GRAM, MDS and GASS Interactions GRAM MDS GASS process resource process resource process resource GIIS GridFTP in.ftpd LDAP LDAP job manager GRIS gatekeeper RSL/DUROC/HTTP 1.1 LDAP LDAP gsiftp job allocation job management resource discovery data transfer data control user / proxy Client
Strengths: Mindshare and collaboration in both industry & academia Open source Standards-based underpinnings (e.g. SSL, LDAP) Flexibility and CoG API's Driving OGSA with heavy resource commitment from IBM Weaknesses: Significant effort required to get applications working on a grid Not production quality at this time No “metascheduler” -- user has to explicitly tell their jobs where to run Globus Toolkit V2 Strengths and Weaknesses
Issues inGrid Computing Hardware : Blades
Hardware Trends • HW Trends that enable Grids and Distributed Processing • There is a lot of idle computing power • Computers are now better connected • There are many different brands and configurations in any environment • And Distributed Computing that give rise to new HW architectures • Blade Computers
What is a blade? • Inclusive chassis-based modular computing system that includes processors, memory, network interface cards and local storage on a single board. Blade Blade Farm Blade Chasis & Blades
Low Cost (power, heat, data center space) Physical Server Consolidation (Save space, eliminate cables) High Availability Integrated Systems Management Not suitable in small numbers Need for standardization (for network connection and management) Advantages & Disadvantages
Blades & Grid • Each blade is a server that can run jobs. • Blades can be used to form clusters or grids. • With efficient management different configurations of blades can be used in a single grid computer. • Easy to expand • Protects investment
Issues inGrid Computing System Management : Globus Toolkit
Globus Toolkit V2 “Pillars” Resource Management (GRAM) Information Services (MDS) Data Management (GASS) Grid Security Infrastructure (GSI)
Globus Toolkit V2 Stack GRAM MDS GASS/GridFTP HTTP LDAP FTP GSI TLS/SSL TCP/IP
Globus Toolkit V2 Key Components:GRAM, MDS and GASS • Grid Resource Allocation Manager (GRAM) • Server-side: “gatekeeper” process that controls execution of job managers • Client-side: “globusrun” UI to launch jobs • Monitoring and Directory Service (MDS) • GRIS: Grid Resource Information Service collects local info • GIIS: Grid Index Information Service collects GRIS info • Global Access to Secondary Storage (GASS) • GridFTP, implemented through “in.ftpd” daemon and “globus-url-copy” command • Files accessed through a URI, e.g. gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt
Globus Toolkit V2 Additional Components • Grid Packaging Tools (GPT) • Used to build (“gpt-build”), install (“gpt-install”) and localize (“gpt-postinstall”) Globus components • MPICH-G2 • A Globus V2 enabled version of MPI (Message Passing Interface) • Based on MPICH • Utilizes GSI, MDS and GRAM
Network Grid Node Grid Node Grid Node Grid Node gatekeeper gatekeeper gatekeeper gatekeeper GRIS GRIS GRIS GRIS in.ftpd in.ftpd in.ftpd in.ftpd Globus Toolkit V2 Network Services Client Node GRAM Client GIIS Server Certificate Authority
GRAM, MDS and GASS Interactions GRAM MDS GASS process resource process resource process resource GIIS GridFTP in.ftpd LDAP LDAP job manager GRIS gatekeeper RSL/DUROC/HTTP 1.1 LDAP LDAP gsiftp job allocation job management resource discovery data transfer data control user / proxy Client
Strengths: Mindshare and collaboration in both industry & academia Open source Standards-based underpinnings (e.g. SSL, LDAP) Flexibility and CoG API's Driving OGSA with heavy resource commitment from IBM Weaknesses: Significant effort required to get applications working on a grid Not production quality at this time No “metascheduler” -- user has to explicitly tell their jobs where to run Globus Toolkit V2 Strengths and Weaknesses
Issues inGrid Computing Software : Scheduling
Superscheduling • Superscheduling means scheduling resources in multiple administrative domains. • Various models • Submiting a job to a specific single machine • Submiting a job to single machines at multiple sites (With cancellation option) • Scheduling a single job to use multiple resources • Most common superscheduler : USERS
Phases Of Superscheduling • Resource Discovery • Authorisation Filtering • Application Requirement Definition • Minimal Requirement Filtering • System Selection • Gathering Information (Query) • Select Systems to run on • Run the Job • Make an Advance Reservation (Optional) • Submit Job to Resources • Preperation Tasks • Monitor Progress • Job Completion • Completion Tasks Source : Global Grid Forum, Scheduling Working Group, 10 Actions When Scheduling, Schopf, 2001
Scheduling Framework (Ranganathan & Foster 2003) • External Scheduler • Local Scheduler • Dataset Scheduler
Scheduling And Replication Algorithms • External Scheduler • JobRandom • JobLeastLoaded • JobDataPresent • JobLocal • Dataset Scheduler • DataDoNothing: No Active Replitication. Everything is on demand • DataRandom: Popular Datasets are replicated to Random Sites • DataLeastLoaded: Popular Datasets are snet to the least loaded sites.
Simulation Results Average Response Times Average Data Transfered
Grid Computing Thank You and Questions?