140 likes | 321 Views
CSS434 Grid Computing Textbook No Corresponding Chapters. Professor: Munehiro Fukuda A portion of these slides were compiled from The Grid: Blueprint for a New Computer Infrastructure. High-speed Information high way. Network Infrastructure.
E N D
CSS434 Grid Computing Textbook No Corresponding Chapters Professor: Munehiro Fukuda A portion of these slides were compiled from The Grid: Blueprint for a New Computer Infrastructure. CSS434 Grid Computing
High-speed Information high way Network Infrastructure • Users login their organizational systems first locally or remotely. • If they are affiliated with other organizations, • They can login from the system of their main use to some other systems. (They are given an opportunity to use those resources in parallel). • Problems: • They must orchestrate job execution among the resources they use. • Should those resources be limited to such a handful number of researchers? CSS434 Grid Computing
Purposes of Computational Grid • Use computing resource connected to high-speed information highway as if we use electric power grid • Only 30% utilization in academic/commercial environments. • Many applications have only episodic requirements. So, why don’t we share computation resource? • Computational results and data should be also made available to all users. • Users: • Computational scientists and engineers • Experimental scientists • Association and corporations • Training and education • Consumers (E-commerce) CSS434 Grid Computing
Grid Applications CSS434 Grid Computing
Grid Services Architecturefrom www.globus.org slide High-energy physics data analysis Collaborative engineering On-line instrumentation Applications Regional climate studies Parameter studies Distributed computing Collab. design Remote control Application Toolkit Layer Data- intensive Remote viz Information Resource mgmt . . . Grid Services Layer Security Data access Fault detection Transport . . . Multicast Grid Fabric Layer Instrumentation Control interfaces QoS mechanisms CSS434 Grid Computing
Programming ModelUniform Access • Paradigm • Bag of task or master workers (Condor-MW) • Client server (NetSolve) • Object oriented (Legion) • Synchronous applications (Not suited for massively parallel computation.) • Language Support • MPI-G – message passing (Globus) • Open MP – shared memory • Math Library – remote procedure (NetSolve) CSS434 Grid Computing
Resource ManagementDiscovery, Allocation, and Scheduling • Centralized resource manager • +: easy to manage • –: a bottleneck • Decentralized resource manager • A collection of centralized manager (Condor’s gate flocking) • A combination of meta and local schedulers. CSS434 Grid Computing
Fault Tolerance • Check-pointing • At the master (Condor) • At each node but collected at the master (Catalina) • Use a whiteboard (Optimal Grid) • Re-execution of fault worker jobs from the beginning (Bayanihan, Optimal Grid) • Error code (NetSolve) • User is responsible to handle errors. CSS434 Grid Computing
Security • Resources covered with security layers • Legion (Message/MayI layers) • Entropia (Intercepting all system calls) • A use of commodity tools • SSL • Public key • Security Certificate • Java sandbox • Kerberos CSS434 Grid Computing
NetSolvehttp://icl.cs.utk.edu/netsolve/ Network of servers Client • RPC-based approach • Clients • Include a set of APIs called as (asynchronous) RPCs • Agents • Match client’s requests for services with servers • Servers • Encapsulates remotely accessed numerical libraries Agent Agent choice Scalar server Client request reply MPP servers CSS434 Grid Computing
Legionhttp://legion.virginia.edu/ • Legion classes • Act as managers and make policy • Core objects • Provide mechanisms that classes use to implement policies: hosts (processors), vaults(memory), context, binding agents, etc. • Per-Program Scheduling • Participating sites can assure their local policies. • User can choose a scheduling policy. Prog request Enactor Scheduler Converted Legion object ID By context objects reserve search Converted Logion object address By binding agents Resource database Class Host collection tty Host Host tty Resources Class tty CSS434 Grid Computing
Condorhttp://www.cs.wisc.edu/condor/ A: User’s local agent R: Each computer resource M: Central manager I/O forwarded to a user’s home CSS434 Grid Computing
AgentTeamwork at UWBArchitecture CSS434 Grid Computing
Paper Review by Students • Globus • Legion • Condor • Netsolve • Discussions • What programming or execution model is each system based on? • What resource allocation and scheduling algorithm does each system use? • Are they fault-tolerant? • Did they any special security features for their own? CSS434 Grid Computing