130 likes | 234 Views
Process Management & Monitoring WG. Quarterly Report October 10, 2002. Components. Process Management Process Manager Checkpoint Manager Monitoring Job Monitor System/Node Monitors Meta Monitoring Data Migration. “Next Steps” From June 2002. Prototyping will continue
E N D
Process Management & Monitoring WG Quarterly Report October 10, 2002
Components • Process Management • Process Manager • Checkpoint Manager • Monitoring • Job Monitor • System/Node Monitors • Meta Monitoring • Data Migration PMWG Quarterly Report
“Next Steps”From June 2002 • Prototyping will continue • Interfaces will stabilize • Checkpoint Manager • Process Manager • Monitors • Monitoring data… PMWG Quarterly Report
Group Progress • Prototyping and development continue • How to interface to something for which we can’t yet visualize the implementation? • Some interface progress • Validating schema for Process Manager • Early Node Monitor schema • Conference calls on alternate weeks PMWG Quarterly Report
Component Progress • Checkpoint Manager (LBNL) • Process Manager (ANL) • Monitoring (NCSA) PMWG Quarterly Report
Checkpoint ManagerWork at LBNL • Serial (intranode) checkpoints • Checkpoint job(s) on a single node • Parallel (internode) checkpoints • Checkpoint a multi-node job • Scalable Systems Checkpoint Manager • Scalable Systems XML interfaces PMWG Quarterly Report
Checkpoint ManagerWork at LBNL • Serial (intranode) checkpoints • System-level for best coverage • Full requirements in a technical report • Prototype to demonstrate here • Based on pre-existing vmadump code • Extended for multi-threaded processes • Provides hooks for runtime libraries • Coverage is still limited PMWG Quarterly Report
Checkpoint ManagerWork at LBNL • Parallel (internode) checkpoints • Works with the job control system • Cooperates with the runtime libraries • Working with LAM/MPI team to implement • We will have a joint demo at SC02 • NPBs as a realistic goal • Runtime interfaces are due May ‘03 PMWG Quarterly Report
Checkpoint ManagerWork at LBNL • Scalable Systems Checkpoint Manager • Will provide Scalable Systems interface to the parallel checkpoint capability • Interface only roughly defined • Interface refinement still to follow • XML Interfaces are due May ‘03 PMWG Quarterly Report
Process ManagerWork at ANL • Rusty Lusk… PMWG Quarterly Report
MonitoringWork at NCSA • Mike Showerman… PMWG Quarterly Report
Data Migration • Still no work done here • Mostly dismissed at last meeting • Will disappear at next meeting PMWG Quarterly Report
Next Steps • Prototyping will inevitably continue • Interfaces will continue to stabilize • Checkpoint Manager • Process Manager • Monitors • Monitoring data… • Now have a framework started PMWG Quarterly Report