140 likes | 328 Views
Overview of Recent MCMD Developments. Jarek Nieplocha CCA Forum Meeting San Francisco. MCMD Working Group. Recent activities focus on development of specifications for CCA-based processor groups teams BOFs held during CCA meetings in April and July, 2007 Mini-Workshop held January 24, 2007
E N D
Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco
MCMD Working Group • Recent activities focus on development of specifications for CCA-based processor groups teams • BOFs held during CCA meetings in April and July, 2007 • Mini-Workshop held January 24, 2007 • Use cases documented and analyzed • Wiki webpage and mailing list: https://www.cca-forum.org/wiki/tiki-index.php?page=MCMD-WG • Specifications document version 0.3 • Telecon held Sept 28, 2007 • Several other people sent good comments by email • Issues about threads, fault tolerant environment, MPI-centric narrative and examples, ID representation • Plans • Complete work on the spec document be end of 2007 • Telecon, mailing list discussions and reviews • Prototype implementation and some application evaluation • NWChem, subsurface
Multilevel Parallelism • How can applications effectively exploit the massive amount of h/w parallelism available in petaflop-scale machines? • Massive numbers of CPUs in future systems require algorithm and software redesign to exploit all available parallelism • Multilevel parallelism • Divide work into parts that can be executed concurrently on groups of processors • Can exploit massive hardware parallelism • Increases granularity of computation => improve the overall scalability Task 1 Task 2 Task 1 Task 2
MCMD MCMD SCMD SCMD Multiple Component Multiple Data • MCMD extends the SCMD (single component multiple data) model that was the main focus of CCA in Scidac-1 • Prototype solution described at SC’05 for computational chemistry • Allows different groups of processors execute different CCA components • Main motivation for MCMD is support for multiple levels of parallelism in applications NWChem example
MCMD Use Cases • Coop Parallelism • Hierarchical Parallelism in Computational Chemistry • Ab Initio Nuclear Structure Calculations • Coupled Climate Modeling • Molecular Dynamics, Multiphysics Simulations • Fusion use-case described at Silver Springs Meeting
Single/Multiple mpiruns MPI Tasks/ Processes Threads Threads Target Execution Model and Global Ids • Global id specification • global id = <machine id> + <job id> + <task/process rank> + <thread id>
Group Management • Various execution models • E.g. coop parallelism vs. single mpirun • Programming Models • Should be MPI-Friendly but also open to other models • MPI, Threads, GAS models including GA, UPC, HPCS languages • Global process and team ids • Group translators
CCA Processor Teams MPI Job B MPI Job A • We propose to use a slightly different term of process(or) teams rather than groups • Avoid confusion with existing terminology and interfaces in programming models • Some use cases call for something more general than MPI groups e.g., COOP with multiple mpiruns • For example, CCA team can encompass a collection of processes in two different MPI jobs. We cannot construct a single MPI group corresponding to that. • Operations on CCA teams might not have direct mapping to group operations in programming models that support groups MPI groups CCA Process Team
CCA Team Service • How do initialize the application? • COOP example makes it non-trivial • Provides the following • Create, destroy, compare, split teams • More capabilities can be added as required • Assigns global ids to tasks from one or more jobs running on one or more machines • Global id = <machines id> + <job id> + < task id> • Also, <thread id> if we were to support threads at component level in the future • Locality Information • Gets the job id, machines id, task id of the given task
MPI Group Service PVM Group GA Group CCA Team Plugins Provide mappings between CCA teams and task/image/thread groups for programming models components written in MPI Group Service GA Group Service CCA Team Service Interoperable GroupService Layer PVM Group Service XYZ Prog Model’s Group Service
Example Coupled System PVM Job A MPI/GA Job B Ocean Land Ocean Model Land Model I/O PVM ProcGroup GA ProcGroup MPI ProcGroup Global CCA Team
Specification Document • Version 0.3 on wiki (Word, PDF) • Please review and contribute • Looking at candidate applications and component s/w for initial evaluation • Numerical, I/O
Issues from the Telecon • Eliminate threads from the spec + • Add more emphasis on mixing multiple programming models + • How do we handle global ids ? • Pros and cons of using integers • Conclusion is to use "global ids" asobjects and introduce a new representaion called "global ranks". • Need for dynamic team management
Dynamic Behavior • We want to support dynamic nature of applications • Application composed of parallel jobs that are launched and complete at different stages of application execution • Fault tolerance in style of FT-MPI • Adaptation to faults • Teams can shrink/expand. Cannot count of persistency of values returned by team service calls.