360 likes | 550 Views
Distributed Computing. Lecture 7: Grid Computing. Tuesday, February 24, 2009. “UNIX was never designed to keep people from doing stupid things, because that policy would also keep them from doing clever things.” Doug Gwyn. Aggregation of Computing Power: Clustering.
E N D
Distributed Computing Lecture 7: Grid Computing
Tuesday, February 24, 2009 “UNIX was never designed to keep people from doing stupid things, because that policy would also keep them from doing clever things.” • Doug Gwyn
Aggregation of Computing Power: Clustering • High Performance Computing (HPC) environments created using workstations interconnected via high speed networks. • Desirables in clustering • Scalable solutions • Readily available environment for research into parallel computing. • Unused computing cycles can be scavenged providing inexpensive additional computing capacity. • Commodity microprocessor based systems offer enormous cost benefits. • Robust/stable first generation software available.
Disadvantages of Clustering • Cluster is a dedicated facility built at a single location. • Financial, political and technical constraints place limits on the size of clusters. • Generally fall outside the financial limits of individual research groups.
The Grid Problem Grid • Flexible, secure, coordinated resource sharing among dynamic collections of individuals and institutions • Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of… • central location, • central control, • omniscience, • existing trust relationships. Diff. P2P Vs Grid
The Grid Problem (contd.) • Infrastructure?, framework ?, platform ? Architecture ? • Infrastructure and failure • The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem solving and resource brokering strategies emerging in industry, science and engineering. • This sharing is, necessarily highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs. • A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization (VO).
Elements of the Problem • Resource sharing • Computers, storage, sensors, networks, … • Sharing always conditional: issues of trust, policy, negotiation, payment, … • Coordinated problem solving • Beyond client-server: distributed data analysis, computation, collaboration, … • Dynamic, multi-institutional virtual orgs • Community overlays on classic org structures • Large or small, static or dynamic
Broader Context • “Grid Computing” has much in common with major industrial thrusts • Business-to-business, Peer-to-peer, Application Service Providers, Storage Service Providers, Distributed Computing, Internet Computing… • Sharing issues not adequately addressed by existing technologies • Complicated requirements: “run program X at site Y subject to community policy P, providing access to data at Z according to policy Q” • High performance: unique demands of advanced & high-performance systems
Common Characteristics of Grids • Grids are large • Both in terms of number of potentially available resources and their geographical dispersion. • Grids are distributed • Latencies involved in moving data between resources are substantial and may dominate applications. • Grids are dynamic • Available resources change on the same time scale as the life span of a typical application.
Common Characteristics of Grids (Contd.) • Grids are heterogeneous • Form and properties of sites differ in significant ways. • Grids cross the boundaries of human organizations • Policies for access to and use of resources differ at different sites.
Typical Issues in Grids • Resource discovery • Execution planning • Authentication and security • Heterogeneity of compute servers and data formats • Pricing
Grid Architecture Application Languages and Frameworks Collective APIs and SDKs Collective Services Resource APIs and SDK Resource Services Connectivity APIs and SDKs Connectivity Services Fabric
Grid Services Architecture Applications High Energy Physics Data Analysis Online Instrumentation Collab Engineering Climate Studies Application Toolkits High Throughput Remote Control Collab Design Remote Visualization Message Passing Data Intensive Grid Services Information Services Security Data Management Portability Resource Management Fault Detection Grid Fabric Data Transport Control Interfaces Instrumentation Schedulers Operating Systems QoS Services
Fabric • Provides resources to which shared access is mediated by grid protocols. • Fabric components implement the resource specific operations that occur on specific resources as a result of sharing operations at higher layers. [LSF] • At a minimum, resources should implement, • Enquiry mechanisms • Permit discovery of structure, state and capabilities of resources. • Resource management mechanisms • Provide control of delivered quality of service.
Connectivity Layer • Defines core communication and authentication protocols required for Grid-specific network transactions. • Communication requirements include transport, routing and naming. • Authentication protocols are built on communication protocols to provide cryptographically secure mechanisms for verifying the identity of users and resources.
Resource Layer • Builds on connectivity layer protocols to define protocols, APIs and SDKs for the secure negotiation, initiation, monitoring, control, accounting and payment of sharing operations on individual resources. • Resource layer protocols call Fabric layer functions to access and control resources. • Resource layer protocols are chosen to capture the fundamental mechanisms of sharing across many different resource types without constraining the type and performance of higher protocols.
Resource Layer (Contd.) • Primary classes of resource layer protocols • Information protocols • Used to obtain information about the structure and state of a resource, e.g., configuration, current load, usage policy, etc. • Management protocols • Used to negotiate access to a shared resource. • Parameters typically specified • Resource requirements. • Operations to be performed. • Requested protocol operations should be consistent with the policy under which resource is shared. • Accounting and payment typical issues.
Collective Layer • Contains protocols and services (and APIs and SDKs) that are not associated with any one specific resource. • These protocols are global in nature and capture interactions across collections of resources. • They can implement a wide variety of sharing behaviours without placing new requirements on the resources being shared.
Collective Layer (Contd.) • Typical services • Directory services • Co-allocation, scheduling and brokering services • Monitoring and diagnostic services • Data replication services • Grid enabled programming systems • Workload management systems and collaboration frameworks • Software discovery services • Community authorization services • Community accounting and payment services • Collaboratory services
Collective Layer(Contd.) • Unlike Resource layer protocols, Collective layer can vary from being very general to highly application specific. • Collective functions can be implemented as persistent services with associated protocols and SDKs designed to be linked with applications. • For large user communities, Collective layer protocols need to be standards based.
Application Layer • User applications. • Constructed in terms of and by calling upon services defined at any layer. • At each layer, well defined protocols provide access to some useful resources. • At each layer APIs may also be defined whose implementation exchange protocol messages with appropriate services to perform desired actions. • Application layer, in practice, uses sophisticated frameworks and libraries defining protocols, service and APIs.
Application Layer(Contd.) Application Languages and Frameworks Collective APIs and SDKs Collective Services Resource APIs and SDK Resource Services Connectivity APIs and SDKs Connectivity Services Fabric
Example Grids • GridLab Testbed • Ten thousand machines in Europe for developers of Grid tools • SC2001 ARG Testbed & Global Grid Testbed Collaboration • Hastily assembled loose federation of world machines for SC2001 and SC2002 demonstrations • NCSA Virtual Machine Room and PACI Grid • Production resources • TeraGrid (www.teragrid.org) • USA distributed terascale facility at 4 sites for open scientific research • Information Power Grid (www.ipg.nasa.gov) • NASAs high performance computing grid
Introduction: Grid Computing • The Grid Computing ProblemCoordinated resource sharing and problem solving in dynamic, heterogeneous environment. • Characteristics of current Grid system • Large-scale • Heterogeneous • Dynamic resource sharing relationship • Pros and Cons • Pros: large-scale, heterogeneity, flexibility • Cons: static availability of resources, infrequent change
Introduction: Mobile Computing • What is mobile computing about?Build a distributed system for a network in which mobile devices and static hosts connected via wireless links. • Characteristics of mobile computing • Versatile communication (no wire constraints) • Ubiquitous computation • Flexible usability • Pros and Cons • Pros: ubiquity, availability, productivity • Cons: constraints of wireless network • Unpredictable network quality • Lowered trust and robustness • Limited local resources and battery lifetime for mobile devices
Geographic databases Wireless links Forest fire Firemen Fire simulationWeather forecast Firemen Firemen Computation center History databases Firemen Mobile Grid: Grid in mobile environment • Mobile Grid: Sharing both advantages • Powerful computation capability of Grid system • Ubiquitous and flexible availability of mobile system • A scenario: • Other scenarios: scientific application, commercial business
Exploring Mobile Grid (Outline) • Overview of GridGrid architecture • Performancescheduling scheme, scheduling algorithm • Energy awarenessdynamic power management, computation offloading • Adaptationdisconnected operation, adaptive application • Securitymobile authentication • Address mobility and location independent namingmobile IP, ad hoc protocols • Distributed, reliable and scalable storagepeer-to-peer resource routing
Scheduling: Application Level Scheduling • Goal of scheduling: maximize application performance. • Application Level Scheduling (AppLeS) • An application-specific approach to build scheduler for parallel applications on heterogeneous systems. • Comprehensive system and application information • Static information • User-specified application parameters • Application performance model • Dynamic information: Network Weather Service • Performance prediction: Network Weather Service • Experience the system from the point view of application • Run-time scheduling: Information is applied to application model to estimate application performance and choose an optimal resource allocation from a set of viable configurations. • Goodness: accurate F. Berman, R. Wolski, S. Figueira, J. Schopf, and G. Shao, "Application-Level Scheduling on Distributed Heterogeneous Networks." In Proceedings of Supercomputing 96, Pittsburgh, PA, Nov. 1996.
Energy saving • “Energy crisis” of mobile devices • Performance also concerns energy • Energy consumption estimation • Simulation: SimplePower, Wattch • Empirical methods • Ways to save energy • Dynamic power management (DPM) policies: tradeoff between energy and performance • Spin down disks • Turn off screen • Network interface hibernation • Processor voltage scaling • Comprehensive stochastic model • Computation offloading
Computation offloading • Scheduling in terms of energy: • Offloading can reduce computation, but communication also consumes energy • Optimize energy consumption by offloading part of computation • Model a program • Task definition: each call site (statically); each invocation (dynamically) • Cost graph • Relationship between tasks and data • Node weight indicating power consumption of computation and communication • Edge weight indicating mean number of times for tasks accessing data • Aggregate the consumption from the cost graph and optimize Zhiyuan Li, Cheng Wang, Rong Xu, "Computation offloading to save energy on handheld devices: a partition scheme." In Proceedings of the international conference on compilers, architecture, and synthesis for embedded systems, Atlanta, Georgia, USA, 2001.
Hoarding Logical reconnection Disconnection Emulation Reintegration Physical reconnection Disconnected operation • Another fact affects performance: unpredictable network link quality • Solution: adaptation [application level adaptation] • Disconnected operation in Coda file system • Definitiona mode of operation that enables a client to continue accessing critical data during temporary failures of a shared data repository. • Solution: proxy + cache • Venus: client-side proxy • Three working states • Hoarding • Emulation • Reintegration James J. Kistler, M. Satyanarayanan, "Disconnected Operation in the Coda File System." ACM Transactions on Computer Systems, Feb. 1992, Vol. 10, No. 1, pp. 3-25.
Mobile security • Difficulties of security in wireless mobile environment • Inherent vulnerability of wireless media • Performance impact! • Charon: indirect authentication using Kerberos • Extend Kerberos by inserting a remote proxy (again!!) between client and other servers • Secure channel is built by first granting the proxy service to client • Proxy interacts with other servers on client’s behalf • Client can be very small: only need DES encryption/decryption • No compromise of security: • The communication between client and proxy is encrypted • Proxy believes the identity of user • Proxy does not possess client’s session key and private key Armando Fox, Steven D. Gribble, "Security on the move: indirect authentication using Kerberos." In Proceedings of the second annual international conference on Mobile computing and networking (MobiCom'96), Rye, New York, United States, 1996.
Conclusions • Incorporating mobility into Grid architecture is necessary and beneficial. • Problems arise since meaning of performance is extended • Computational performance: scheduling • Energy: power management and offloading • Unstable network: adaptation • Security • Addressing and naming • Scalability & Reliability • A lot can be borrowed from other research areas, but they should be put into a real Mobile Grid framework for inspection. • Future focus: comprehensive scheduling
Recommended Reading Anatomy of the Grid Physiology of Grid