280 likes | 467 Views
Front-end node. B ack-end node. Installation stage. Authenticator As Root. Job Scheduler. 1. Request. 2 . Spawn. 2 . Spawn. Job front-end As User1. 3 . Conversation. Job back-end As User1. 1. Request. Job front-end As User2. Job back-end As User2. 3 . Conversation. Backend node.
E N D
Front-end node Back-end node Installation stage Authenticator As Root Job Scheduler 1. Request 2. Spawn 2. Spawn Job front-end As User1 3. Conversation Job back-end As User1 1. Request Job front-end As User2 Job back-end As User2 3. Conversation
Backend node Frontend node Job Backend Process Running As A Job Frontend Process Running As A (First time) 3. Conversation 1. SSH 2. Spawn Job Launcher Daemon Job Scheduler Daemon 1. MAC message 2. Spawn Job Backend Process Running As A Job Frontend Process Running As A (Second time) 3. Conversation
Job frontend process Job Scheduler Daemon Nn = Number of nodes Np = Number of processes Nt = Threads per process Request Job (username, Nn, Np, Nt) Cancel Job (errmsg) If insufficient resources Assign Job number (errmsg) If sufficient resources Assign required number of backend nodes. If a backend daemon is already running on the backend node, send “Assign Backend Daemon”. Else send “Create Backend Daemon” Assign Backend Daemon Create Backend Daemon Renew lease () Periodically while job is in progress Renew lease () Periodically while job is in progress Job finished () When job finishes Cancel Job (errmsg) If user cancels job Cancel Job (errmsg) If error in Job Scheduler, or admin cancels job Backend failed (name) If frontend detects backend has failed
Server Client Time SSH_MSG_KEXINIT SSH_MSG_NEWKEYS SSH-transport SSH_MSG_SERVICE_REQUEST SSH_MSG_SERVICE_ACCEPT SSH_MSG_USERAUTH_REQUEST SSH-userauth SSH_MSG_USERAUTH_SUCCESS SSH_MSG_CHANNEL_OPEN SSH_MSG_CHANNEL_OPEN_CONFIRMATION . . . SSH-connection SSH_MSG_CHANNEL_CLOSE SSH_MSG_CHANNEL_CLOSE
Server Client Time SSH_MSG_USERAUTH_REQUEST SSH_MSG_USERAUTH_FAILURE SSH_MSG_USERAUTH_REQUEST SSH_MSG_USERAUTH_INFO_REQUEST SSH_MSG_USERAUTH_INFO_RESPONSE SSH_MSG_USERAUTH_SUCCESS
Server Client Time Establish TCP connection SSH_protoversion_softwareversion SSH_protoversion_softwareversion SSH_MSG_KEXINIT SSH_MSG_KEXINIT SSH_MSG_NEWKEYS SSH_MSG_NEWKEYS SSH_MSG_SERVICE_REQUEST SSH_MSG_SERVICE_ACCEPT
Server Client Time Establish Authenticated Transport Layer Connection SSH_MSG_CHANNEL_OPEN SSH_MSG_CHANNEL_OPEN_CONFIRMATION SSH_MSG_CHANNEL_DATA SSH_MSG_CHANNEL_DATA . . . SSH_MSG_CHANNEL_CLOSE
SSH protocol layering Connection Layer Protocol User Authentication Layer Protocol SSH Layers Transport Layer Protocol TCP Time IP
Job frontend process Job Backend Daemon Challenge (random number A) Initiate authentication Authenticate (random number B, MAC1) One side authentication Authenticate ( MAC3, Job execution parameter) The other side authentication
Job frontend process Job backend process mcg = middleware channel group wcg = world channel group fcg = frontend channel group Backend ready (rank, mcg, wcg, fcg) Commence job (mcg[], wcg[], fcg[], properties, mainclass, args) Sent to each backend when all backends are ready Request resource (name) To load a class from the user's program Report resource (name, bytecodes) Buffer goes to job frontend which writes it to file or stdout/stderr Write file (fd, buf, off, len) Renew lease () Periodically while job is in progress Renew lease () Periodically while job is in progress Backend finished () When main program finishes Job finished () Sent to each backend when all backend main programs have finished Cancel Job (errmsg) If job aborts Cancel Job (errmsg) If error in job backend
Job backend Daemon Job Scheduler Daemon Ready (username, ip, port) When a job backend daemon is created Renew Lease(username, ip, port) Periodically while job is in progress Terminate (username, ip, port) When the job backend Daemon is about to terminate
6 Job Frontend Process Backend Process 7 5 1 2 3 Job Scheduler Daemon Backend Daemon 4
RIT CS Paranoia 32-Processor Cluster Refresh Mon Feb 08 16:39:49 EST 2010 -- Parallel Java v20100121 Nodes Jobs
Welcome, xxh2229 RIT CS Paranoia 32-Processor Cluster Logout Refresh Mon Feb 08 16:39:49 EST 2010 -- Parallel Java v20100121 Nodes Jobs upload Required CPUs Run Args
Welcome, administrator RIT CS Paranoia 32-Processor Cluster Logout Refresh Mon Feb 08 16:39:49 EST 2010 -- Parallel Java v20100121 Nodes Jobs
Frontend node Job Frontend Process Running As A Client A Spawn (SSH) Submit a job Job Scheduler Daemon Web Browse Secure Connection Cancel a Job Cancel (SSH) Job Frontend Process Running As A
Backend nodes Frontend node Job Scheduler Daemon Web Browse Cluster status Spawn (SSH) stdin Job Backend Process Job Frontend Process Terminal Job result stdout stderr
Backend nodes Frontend node Parent Process MPI_Init MPI_Com_spawn MPI_Finalize Child Process MPI_Init MPI_Finalize Spawn Message result
Backend nodes Frontend node Status Job Scheduler Daemon Job Backend Daemon Web Browse Cluster status MAC message SSH Spawn stdin Job Frontend Process Terminal Job Job Backend Process result stdout stderr
Client A Frontend node Submit a job Web Browse Secure Connection Web Server Cancel a Job SSH Agent
Frontend node Web Server SSH Protocol SSH Agent SSH Daemon
Web Server SessionManager SSL HttpRequest HttpServer SSL HttpResponse SSH Agent
User B’s Process Web Server Submit Jobs User A Can not spawn User A’s Process Job Frontend Process
User B’s Process Web Server Submit Jobs SSH Agent User A User A’s credential Root’s Process User A’s Process Spawn Job Frontend Process SSH Daemon
Message M Authentication Key HMAC-SHA-256 Message M HMAC(M)
Receiver Proxy Message (A,m1) m1(){ … } Channel Call A.m1() Object A Object B
Know Hosts JSch JSch SSH Daemon Username Password Session Channel Channel Command …