130 likes | 149 Views
GridSolve is a grid-based software-hardware-data server designed to provide transparent access to resources with dynamic problem-solving capabilities, load balancing, fault tolerance, and security. It enables easy wrapping of legacy codes into services and offers intuitive APIs. Leveraging the GridRPC standard, GridSolve allows seamless interaction with various clients for efficient resource utilization. This system also includes an agent for resource discovery and scheduling, enhancing performance prediction and optimization.
E N D
GridSolve: A Network Enabled Solver Asim YarKhan and Jack Dongarra University of Tennessee
GridSolve • Grid based software-hardware-data server • Based on a Remote ProcedureCall model but with … • resource discovery, dynamicproblem solving capabilities,load balancing, fault tolerance,asynchronous calls, security, … • Easy-of-use paramount • It’s about providing transparent access to resources. • Make it easy to wrap legacy codes into services • Evolution of successful NetSolve project
[x,y,z,info] = gridsolve(‘dgesv’, A, B) GridSolve clients: Matlab, C, Fortran [NetSolve clients: Java, Mathematica, Excel, IDL, Octave] GridSolve Architecture Resourcediscovery Scheduling Loadbalancing Faulttolerance Agent request Single processor server list data Batch queue result Cluster Client Cluster
Client GridSolve Client • Dynamic service bindings • Client does not need to have stubsfor the services that it wishes to use • Opaque networking interactions. • API provides a variety of methods • Blocking, non-blocking, task farms, … • Intuitive and easy to use. • Matlab: Solve using dgesv [x,y,z,info]=gs_call('dgesv',m,1,a,m,b,m) • C: Call dgesv using GridRPC grpc_initialize() grpc_function_handle_default(&handle, "dgesv") status = grpc_call(&handle, n, nrhs, a, lda, ipiv, b, ldb, &info);
GridRPC – Grid Remote Procedure Call • GGF proposed standard • Global Grid Forum Research Group on Programming Models • Implementations: Ninf-G (AIST), GridSolve/NetSolve (UTK),DIET (INRIA, ENS) • GridRPC API • grpc_initialize, grpc_finalize • Function handle create, initialize, destroy, get • Grpc_call blocking, grpc_call_async non-blocking • Grpc_probe, cancel, wait, wait_and/or/any • GridSolve uses GridRPC as primary API • Older NetSolve API available as wrapper • Added calls based on GridRPC API to support fault tolerance, dynamic scheduling, …
GridSolve Agent • Agent acts as name serverand information service • Client users and administrators can query the hardware and software services available. • Interactions mediated by agent • Scheduling, tracking, server fault tolerance, etc • Resource scheduler • Maintains both static and dynamic information regardingserver components • Can use execution history to build performance modelsfor services • Can simulate multi-service executions to predict best server
Server Service New Service Service Service Service GSIDL Parser/ Compiler New Service Added! Adding Services to GridSolve Server Fortran ROUTINE dgesv(IN int N, IN int NRHS, INOUT double A[LDA][N], IN int LDA, OUT int IPIV[N], INOUT double B[LDB][NRHS], IN int LDB, OUT int INFO) "Solves a general system of linear equations AX = B" LIBS = "/usr/local/lib/liblapack.a /usr/local/lib/libf77blas.a /usr/local/lib/libatlas.a" LANGUAGE = "FORTRAN" LIBS = "$(LAPACK_LIBS) $(BLAS_LIBS)" COMPLEXITY = "2.0*pow(N,3.0)*(double)NRHS" MAJOR="COLUMN"
GridSolve Backends Scripts encapsulate service management for PBS, MS Compute Cluster (job submit, probe, cancel) Agent Server Server Server GridSolve Client MS Compute Cluster, PBS [Condor, ScaLAPACK, LFC, etc.] GridSolve System User maybe unaware of parallel processing
Server cluster Distributed Storage Infrastructure client DSI data caches Client optionally pushes argument data to DSI Server Server DSI API currently instantiatedusing IBP (Internet Backplane Protocol)
GridSolve: Benefits • Domain Scientists use SCEs • GridSolve provides the ability for SCE environmentsto easily access and use grid resources [Ease of use!] • Libraries • GridSolve can provide easy access to high performance libraries, so that end users do not have to install them • Scheduling • GridSolve can choose the software/hardware resource appropriate for the problem • Resource Aggregation • GridSolve agent provides a single access point for multiple resources/clusters
GridSolve In-Progress • Scheduling work • Work with Emmanuel Jeannot, INRIA • History based performance estimation • Adds more accurate server/service performance model based on prior history • Communication cost estimates • Client estimates communication costs for a subset of servers via a simple probe • Perturbation model for scheduling • The agent uses a model of the currently executing jobs on the servers to schedule jobs (includes estimated completion times) • Client interfaces • IDL – Interactive Data Language
GridSolve Status • Version 0.15 (Sept 2006) • http://icl.cs.utk.edu/netsolve • Supported Platform • Linux, Solaris, BSDs, MacOS X, • Should work in most POSIX environments • Windows native client (MSVC) • Windows Compute Cluster backend • PBS backend
Contacts Asim YarKhan and Jack Dongarra University of Tennessee