180 likes | 201 Views
General Purpose Grid Computing. LCA. Specification. The system will provide a multi-threaded, shared memory environment that is distributed across a loosely coupled network of peers (a grid). System features include: A tracker application A long running grid client A thread library
E N D
Specification • The system will provide a multi-threaded, shared memory environment that is distributed across a loosely coupled network of peers (a grid). • System features include: • A tracker application • A long running grid client • A thread library • Sample applications using the thread library and system infrastructure
Tracker • A web service which maintains a regularly updated list of active peers on the grid. • This list is made available to grid clients upon request.
Long Running Grid Client • Provides execution service for applications written against the thread library (Executive Role). • Jobs are executed in a secure environment, allowing the execution of code from un-trusted sources (Worker Role). • A GUI provides status information.
Thread Library • Provides the primary interface for developers to utilize the grid in a manner similar to native thread libraries. • The functions provided by the library include: • Init() • CreateThread() • StartThread() • Lock() • Join()
Application • Compiled against API. • Uses Remote Procedure Calls (RPC) to communicate with the Executive Peer. • An init() function is called to establish communication with the Executive Peer. • Calls CreateThread() in the API, which invokes the Executive Peer. • The Application blocks until the thread is created at an available worker node. • A join() function is provided which requires all outstanding grid threads to complete or throws an exception if communication to a thread is lost.
Executive Peer • When the Executive Peer receives a CreateThread() request, it searches for an available node, in the following order: • Local Worker peer • Recently used Remote Worker Peers whose connection details are cached. • Requests a list of available peers from the tracker: • Once a worker peer accepts a job, the assembly is transferred to the remote peer along with the entry point. • The thread is then created and the threadId is returned. • After receiving a start request with a threadId, it forwards the request to the appropriate worker peer. • Once communication is established with a worker peer, a status request is sent periodically.
Worker Peer • When a requestExecution() is received, if the peer is available it will reserve a spot in the execution queue. • Waits for the Executive to send the assembly and the delegate to invoke. • When a invokeThread() is received, the following events occur: • The assembly is loaded as a secure (sandboxed) .NET Application Domain. • .NET Reflection is used to locate the starting point of the thread. • The type which contains the thread start method is instantiated and a proxy object created. • The thread library opens a communication channel to the local peer. • The delegate is then invoked in the appDomain. • Periodically responds to status requests from the Executive peer.
Thread Library • Based on the NGrid interface. • Every object on the grid inherits from GObject. • The thread package provides basic threading functions.
Application API ExecutivePeer WorkerPeer CreateThread(…) CreateThread() requestExecution() No requestExecution() Yes CreateThread() Return ThreadId Return GThread Network boundary Process boundary Activity Flow for CreateThread()
Application API ExecutivePeer WorkerPeer Join() RegisterCallback Polling Block Complete Callback (done) return Error Case Timeout Notify() NotifyRunning() Exception Network boundary Process boundary Kill() Activity Flow for Join()
Team Structure • We have formed four teams based on the main components of the project.
Milestones • Friday 4/21 - Teams begin coding assigned components. • Monday 5/1 - Completed coding of beta components. • Tuesday 5/9 – Beta release due. • Tuesday 5/30 – Final release due. • Tuesday 5/30 - 10:01pm - Celebrate.
Risks • Using the grid is slower than using a single machine. • Scope of the project is simply too large to be completed in the given schedule. • Scope of the project is too limited to be useful.
Testing • System Failure cases: • Worker peer loses connection: the Executive peer will recognize the event and handle it gracefully. • Executive peer loses connection: all a worker needs to do is recognize the event and kill any processes it may be running for that Executive.
Documentation • System documentation will include detailed API specification as well as sample code and sample applications. • The sample code and applications will be commented such that they can act as tutorials for a programmer coding against the API.