1 / 10

Process Management

Process Management. Meeting at Argonne February 24-25, 2003. Schematic of Process Management Component in Context. NSM. SD. Sched. EM. MPD’s. SSS Components. QM. PM. PM. SSS XML. application processes. mpdrun. simple scripts using SSS XML. Brett’s job submission language.

shadi
Download Presentation

Process Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Process Management Meeting at Argonne February 24-25, 2003

  2. Schematic of Process Management Component in Context NSM SD Sched EM MPD’s SSS Components QM PM PM SSS XML application processes mpdrun simple scripts using SSS XML Brett’s job submission language XML file mpiexec (MPI Standard args) interactive “Official” SSS side Prototype MPD-based implementation side

  3. MPD Progress • MPD-2 • In Python • Distributed as part of MPICH-2 (implements PMI) • Supports requirements of SSS • Separate executables for each process • Separate arguments for each process • Separate environment variables for each process • Supports MPI Standard mpiexec as job start command • Includes some of the SSS requirements

  4. New XML for Process Manager <create-process-group pgid='job23' submitter='lusk' totalprocs='10' output='discard' > <process-spec range='1' exec='cpi_master' user='ell' cwd='/home/ell/rundir' path='/home/ell/progs' coprocess='tvdebuggersrv' /> <arg idx='1' val='-loops' /> <arg idx='2' val='1000' /> <env name='TV_LICENSE' val='23416784' /> </process-spec> <process-spec range='2-10' exec='cpi_slave' user='ell' cwd='/home/ell/rundir' path='/home/ell/progs' coprocess='tvdebuggersrv' /> <env name='TV_LICENSE' val='23416784' /> </process-spec> <host-spec idx=‘1’ val=‘ccn%s:64-68’ /> <host-spec idx=‘2’ val=‘ccn%s:70-74’ /> </create-process-group>

  5. Querying the PM The following example retrieves the pgid's of processes that were submitted by lusk or desai, and in lusk's case, only returns the process groups that have processes running on two specific hosts. The restrictions are on the process groups; we always return all the processes in a process group. <get-process-groups> <process-group submitter='lusk' pgid='*' totalprocs='*' > <process-group-restriction pid='*' exec='*' host='ccn70' \> <process-group-restriction pid='*' exec='*' host='ccn230' \> </process-group> <process-group submitter='desai' pgid='*' > </process-group> </get-process-groups>

  6. Response to a Query The message returned by such a query is a set of process groups, with details on their processes filled in as requested by the query. <process-groups> <process-group submitter='lusk' pgid='4521' totalprocs='10'> <process pid='3456' exec='cpi_master' host='ccn64' /> <process pid='1324' exec='cpi_slave' host='ccn65' /> <process pid='7654' exec='cpi_slave' host='ccn66' /> <process pid='6758' exec='cpi_slave' host='ccn67' /> <process pid='9601' exec='cpi_slave' host='ccn68' /> <process pid='7865' exec='cpi_slave' host='ccn70' /> <process pid='9876' exec='cpi_slave' host='ccn71' /> <process pid='6524' exec='cpi_slave' host='ccn72' /> <process pid='3452' exec='cpi_slave' host='ccn73' /> <process pid='5634' exec='cpi_slave' host='ccn74' /> </process-group> <process-group submitter='lusk' pgid='23' totalprocs='1'> <process pid='5554' exec='mpd' host='230' /> </process-group> <process-group submitter='desai' pgid='244' > </process-group> </process-groups>

  7. Using the Wildcard Syntax for More The following command sends a signal 3 to all the processes of all jobs submitted by lusk, and returns the details of which processes groups they were. <signal-process-group signal='3'> <process-group submitter='lusk' pgid='*' /> </signal-process-group> The following command kills all process groups with processes running on ccn56, and returns their submitters, so that they can be told the sad news. <kill-process-group> <process-group submitter='*'> <process host='ccn56' > </process-group> </kill-process-group>

  8. Starting to Have Fun • The combination of • Published interfaces • XML technology • XML libraries built into scripting languages • The SSS communication library enables simple programs for simple tasks. • Reminiscent of Unix pipe-based command-line programs • We use Python with built-in SAX-based library • Easy to connect other tools (CIT daemons)

  9. Submitting a Job Directly to PM #! /usr/bin/env python from xml.dom.minidom import Document, parseString from ssslib import comm_lib executable = raw_input( 'executable? ' ) numprocs   = raw_input( 'numprocs? ' ) pgid   = raw_input( 'process group id? ' ) msg = xml.dom.minidom.Document().createElement( 'create-process-group‘ ) msg.setAttribute( 'totalprocs', numprocs ) msg.setAttribute( 'pgid', pgid ) ps =  xml.dom.minidom.Document().createElement( 'process-spec‘ ) ps.setAttribute( 'exec', executable ) msg.appendChild( ps ) print msg.toprettyxml() comm = comm_lib( debug=0 ) process_manager = comm.ClientInit( 'process-manager‘ ) comm.SendMessage(process_manager, msg.toxml()) ack = comm.RecvMessage( process_manager ) comm.ClientClose( process_manager ) ack_dom = parseString( ack ) print ack_dom.toxml()

  10. Registering for Notification of PM Events #! /usr/bin/env python from sss import event_receiver from xml.dom.minidom import Document class printjobevent: def __init__(self): self.dispatch = { 'event' : self.HandleXMLEvent } def HandleXMLEvent( self, xe, ( peer, port ) ): print ' %s, jobid = %s, at %s' % ( xe.getAttribute( 'msg' ), xe.getAttribute( 'data' ), xe.getAttribute( 'time' )) return Document().createElement('event-ok') if __name__ == '__main__': job_monitor = printjobevent() loop = event_receiver( 'process-manager', '*', '*', 'many', job_monitor )

More Related