280 likes | 421 Views
How the Linux and Grid Communities can Build the Next-Generation Internet Platform. Ian Foster Argonne National Lab University of Chicago Globus Project. The (Power) Grid: On-Demand Access to Electricity. Quality, economies of scale. Time. By Analogy, A Computing Grid.
E N D
How the Linux and Grid Communities can Build the Next-Generation Internet Platform Ian Foster Argonne National Lab University of ChicagoGlobus Project
The (Power) Grid:On-Demand Access to Electricity Quality, economies of scale Time
By Analogy, A Computing Grid • Decouple production and consumption • Enable on-demand access • Achieve economies of scale • Enhance consumer flexibility • Enable new devices • On a variety of scales • Department • Campus • Enterprise • Internet
Requirements • Dynamically link resources/services • From collaborators, customers, eUtilities, … (members of evolving “virtual organization”) • Into a “virtual computing system” • Dynamic, multi-faceted system spanning institutions and industries • Configured to meet instantaneous needs, for: • Multi-faceted QoX for demanding workloads • Security, performance, reliability, …
Automatically connect applications to services • Dynamic & intelligent • provisioning Application Virtualization Infrastructure Virtualization • Dynamic & intelligent • provisioning • Automatic failover For Example:Real-Time Online Processing Applications: Delivery Application Services: Distribution Servers: Execution
Examples of Linux-Based Grids:High Energy Physics • Production Run on the Integration Testbed • Simulate 1.5 million full CMS events for physics studies: ~500 sec per event on 850 MHz processor • 2 months continuous running across 5 testbed sites • Managed by a single person at the US-CMS Tier 1
Examples of Linux-Based Grids:Earthquake Engineering U.Nevada Reno www.neesgrid.org
Grid Technologies& Community • Grid technologies developed since mid-90s • Product of work on resource sharing for scientific collaboration; commercial adoption • Open source Globus Toolkit has emerged as a de facto standard • International community of contributors • Thousands of deployments worldwide • Commercial support providers • Global Grid Forum serves as a community and standards body • Home to recent OGSA work
Managed shared virtual systems Computer science research Open Grid Services Arch Web services, etc. Real standards Multiple implementations Globus Toolkit Internet standards Defacto standard Single implementation The Emergence ofOpen Grid Standards Increased functionality, standardization Custom solutions 1990 1995 2000 2005 2010
Open Grid Services Infrastructure (OGSI) Resource allocation Create Service Authentication & Authorization are applied to all requests Grid Service Handle Service factory Service requestor (e.g. user application) Service data Keep-alives Notifications Service invocation Service discovery Register Service Service instances Service registry Interactions standardized using WSDL and SOAP
Structured Data Integration Job Submission Brokering Workflow Registry Banking Authorisation Transformation Structured Data Access Data Transport Resource Usage Web Services: Basic Functionality Structured Data Relational XML Semi-structured Open Grid Services Architecture Users in Problem Domain X Applications in Problem Domain X Application & Integration Technology for Problem Domain X Generic Virtual Service Access and Integration Layer OGSA OGSI: Interface to Grid Infrastructure Compute, Data & Storage Resources - Distributed Virtual Integration Architecture
But It’s Not Turtles All the Way Down • Our ability to deliver virtualized services efficiently and with desired QoX ultimately depends on the underlying platform! • At multiple levels, including but not limited to • Dynamic provisioning & resource management • Reliability, availability, manageability • Performance and parallelism • New demands on the OS in each area
(1) Dynamic Provisioning • Static provisioning dedicates resources • Typical of “co-lo” hosting • Reprovision manually as needed • But load is dynamic • Must overprovision for surges • High variable cost of capacity • Need dynamic provisioning toachieve true economies of scale • Load multiplexing • Tradeoff cost vs. quality • Service level agreements • Dynamic resource recruitment
Load Is Dynamic • ibm.com external site • February 2001 • Daily fluctuations (3x) • Workday cycle • Weekends off M T W Th F S S • World Cup soccer site • May-June 1998 • Seasonal fluctuations • Event surges (11x) • ita.ee.lbl.gov Week 6 7 8
boot 136w CPU max 120w CPU idle 93w watts Idling consumes 60% to 70% of peak power demand. disk spin 6-10w off/hib 2-3w work For Example:Energy-Conscious Provisioning • Light load: concentrate traffic on a minimal set of servers • Step down surplus servers to low-power state • APM and ACPI • Activate surplus servers on demand • Wake-On-LAN • Browndown: provision for a specified energy target • Even smarter: also manage air conditioning
Power Management via MUSE:IBM Trace Run (Before) Power draw (watts) Latency (ms*50) Throughput (requests/s) 1 ms MUSE: Jeff Chase et al., Duke University (SOSP 2003)
Power Management via MUSE:IBM Trace Run (After) 1 ms MUSE: Jeff Chase et al., Duke University (SOSP 2003)
Dynamic Provisioning: OS Issues • Hot plug memory, CPU, and I/O • For partitioning, core virtualization capabilities • Security • Containment & data integrity in a virtualized environment: user-mode Linux++? • Scheduler improvements for resource and workload management • Allocate for required resource consumption • Dynamic, sub processor logical partitioning • Improved instrumentation & accounting • Determine actual resource consumption
(2) Reliability, Availability, Manageablity • Error log and diagnostics frameworks • Foundation for automated error analysis and recovery of distributed & remote systems • Enable problem determination, automated reconfiguration, localization of failure • Configuration management • Determine hardware configuration/inventory • Apply/remove service/support patches • Isolate failing components quickly
(3) Performance and Parallelism:E.g., Data Integration • Assume • Remote data at 1 GB/s • 10 local bytes per remote • 100 operations per byte >1 GByte/s achievable today (FAST, 7 streams, LAGeneva) Local Network Parallel computation: 1000 Gop/s Remote data Wide area link (end-to-end switched lambda?) 1 GB/s Parallel I/O: 10 GB/s
Performance and Parallelism • Distributed/cluster/parallel file systems • Optimized TCP/IP stacks • Scheduling of computation & communication • Web100 configuration & instrumentation
Web100 Kernel Instrument Set • Definition • Set of instruments designed to collect as much of the information as possible to enable a user to isolate the performance problems of a TCP connection • How it is implemented • Each instrument is a variable in a "stats" structure that is linked through the kernel socket structure • Linux /proc interface is used to expose these instruments outside the kernel
For Example … • Recent transAtlantic transfer showed frequent drops in data rate • But no loss or retransmit • Web100 identified problem as Linux send stall congestion events
Edinburgh Glasgow DL Newcastle Belfast Manchester Cambridge Oxford Hinxton RAL Cardiff London Soton Tier0/1 facility Tier2 facility Tier3 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link Grid/Linux Cooperation:We Have Testbeds, Users, Applications
Evolution of the Server Increased Flexibility (and Complexity) Significant implications for the underlying operating system Time
Summary • The Grid community is creating middlewarefor distributed resource & service sharing • Open source software for resource & service virtualization, service management/integration • Motivated by wonderful applications • But we need help from the OS • Linux: the next-generation Internet platform? • Could be: but significant evolution is required to address provisioning/resource management; availability, manageability; performance and parallelism; and other issues • Grid community can provide testbeds, users, requirements, applications
For More Information • The Globus Project™ • www.globus.org • Global Grid Forum • www.ggf.org • Background information • www.mcs.anl.gov/~foster • GlobusWORLD 2004 • www.globusworld.org • Jan 20–23, San Fran 2nd Edition: November 2003