140 likes | 254 Views
Process Management & Monitoring WG. Quarterly Report June 13, 2002. Components. Process Management Process Manager Checkpoint Manager Monitoring Job Monitor System/Node Monitors Meta Monitoring Data Migration. “Next Steps†From February 2002. Continue to work with the RMWG
E N D
Process Management & Monitoring WG Quarterly Report June 13, 2002
Components • Process Management • Process Manager • Checkpoint Manager • Monitoring • Job Monitor • System/Node Monitors • Meta Monitoring • Data Migration PMWG Quarterly Report
“Next Steps”From February 2002 • Continue to work with the RMWG • Continue the interface work for: • Process Manager • Checkpoint Manager • Begin the interface work for: • Job Manager • Monitors • Prototyping and refinement PMWG Quarterly Report
Group Progress • Prototyping before refining interfaces • Job Manager fell into RMWG scope • Today’s demo set as milestone • Node Monitor provides RM components with data needed for scheduling • Process Manager executes jobs as requested by RM components • Conference calls on alternate weeks PMWG Quarterly Report
Component Progress • Checkpoint Manager (LBNL) • Process Manager (ANL) • Monitoring (NCSA) PMWG Quarterly Report
Checkpoint Manager • Defined as a separate component • Process Manager could register as CM • Requirements document published • Current status summary • Early prototype checkpoint capability • Design still evolving • Working with the LAM/MPI team PMWG Quarterly Report
Checkpoint Manager • Serial (intranode) checkpoints • Checkpoint job(s) on a single node • Parallel (internode) checkpoints • Checkpoint a multi-node job • Scalable Systems Checkpoint Manager • XML interfaces PMWG Quarterly Report
Checkpoint Manager • Serial (intranode) checkpoints • System-level for best coverage • Handles serial or parallel jobs • Provides hooks for runtime libraries • Based on vmadump • Full requirements in a technical report • Early prototype exists PMWG Quarterly Report
Checkpoint Manager • Parallel (internode) checkpoints • Works with job control system • Cooperates with the runtime libraries • Working with LAM/MPI team to prototype • Aiming for SC02 demo of prototype • NPBs as optimistic goal • Runtime interfaces due May ‘03 PMWG Quarterly Report
Checkpoint Manager • Scalable Systems Checkpoint Manager • Will provide Scalable Systems interface to the parallel checkpoint capability • Interface only roughly defined • Interface refinement to follow • XML Interfaces due May ‘03 PMWG Quarterly Report
Process ManagerWork at ANL • Narayan Desai… PMWG Quarterly Report
MonitoringWork at NCSA • Mike Showerman… PMWG Quarterly Report
Data Migration • Still no work done here PMWG Quarterly Report
Next Steps • Prototyping will continue • Interfaces will stabilize • Checkpoint Manager • Process Manager • Monitors • Monitoring data… PMWG Quarterly Report