Multi-Grid

Multi-Grid Esteban Pauli 4/25/06

Overview • Problem Description • Implementation • Shared Memory • Distributed Memory • Other • Performance • Conclusion

Problem Description • Same input, output as Jacobi • Try to speed up algorithm by spreading boundary values faster • Coarsen to small problem, successively solve, refine • Algorithm: • for i in 1 .. levels - 1 • coarsen level i to i + 1 • for i in levels .. 2, -1 • solve level i • refine level i to i – 1 • solve level 1

Problem Description Coarsen Coarsen Solve Refine Solve Refine Solve

Implementation – Key Ideas • Assign a chunk to each processor • Coarsen, refine operations done locally • Solve steps done like Jacobi

Shared Memory Implementations • for i in 1 .. levels - 1 • coarsen level i to i + 1 (in parallel) • barrier • for i in levels .. 2, -1 • solve level i (in parallel) • refine level i to i – 1 (in parallel) • barrier • solve level 1 (in parallel)

Shared Memory Details • Solve is like shared memory Jacobi – have true sharing • /* my_ all locals*/ • for my_i = my_start_i .. my_end_i • for my_j = my_start_j .. my_end_j • current[my_i][my_j][level] = … • Coarsen, Refine access only local – only false sharing possible • for my_i = my_start_i .. my_end_i • for my_j = my_start_j .. my_end_j • current[my_i][my_j][level] = …[level ± 1]

Shared Memory Paradigms • Barrier is all you really need, so should be easy to program in any shared memory paradigm (UPC, OpenMP, HPF, etc) • Being able to control distribution (CAF, GA) should help • If small enough, only have to worry about initial misses • If larger, will push out of cache, have to bring back over network • If have to switch to different syntax to access remote memory, it’s a minus on the “elegance” side, but a plus in that it makes communication explicit

Distributed Memory (MPI) • Almost all work local, only communicate to solve a given level • Algorithm at each PE (looks very sequential): • for i in 1 .. levels - 1 • coarsen level i to i + 1 // local • for i in levels .. 2, -1 • solve level i // see next slide • refine level i to i – 1 // local • solve level 1 // see next slide

MPI Solve function • “Dumb” • send my edges • receive edges • Compute • Smarter • send my edges • compute middle • receive edges • compute boundaries • Can do any other optimizations which can be done in Jacobi

Distributed Memory (Charm++) • Again, do like Jacobi • Flow of control hard to show here • Can send just one message to do all coarsening (like in MPI) • Might get some benefits from overlapping computation and communication by waiting for smaller messages • No benefits from load balancing

Other paradigms • BSP model (local computation, global communication, barrier): good fit • STAPL (parallel STL): not a good fit (could use parallel for_each, but lack of 2D data structure would make this awkward) • Treadmarks, CID, CASHMERe (distributed shared memory): getting a whole page to get just the boundaries might be too expensive, probably not a good fit • Cilk (spawn processes for graph search): not a good fit

Performance • 1024x1024 grid – 256x256 grid, 500 iterations at each level • Sequential time: 42.83 seconds • Left table 4pes • Right table 16 pes

Summary • Almost identical to Jacobi • Very predictable application • Easy load balancing • Good for shared memory, MPI • Charm++: virtualization helps, probably need more data points to see if it can beat MPI • DSM: false sharing might be too high a cost • Parallel paradigms for irregular programs not a good fit

Multi-Grid

Multi-Grid

Presentation Transcript

Grid, Smart grid, CURENT

High Productivity MPI – Grid, Multi-cluster and Embedded Systems Extensions

Moving grid methods and Multi-mesh methods

Multi-Site VOs and Multi-VO Sites in Open Science Grid

The Swiss Multi-Science Computing Grid AAA/SWITCH Project

Multi-Environment Software Testing on the Grid

Multi-organisation Grid Accounting System (MOGAS): PRAGMA deployment update

A Grid-enabled Multi-server Network Game Architecture

PRAGMA Grid A Multi-Application Route-Use Global Grid

Grid

Disconnected Diagrams, Multi-grid, Nvidia & all that y

Portugal Grid Lisbon Grid Alentejo Grid Algarve

The PRAGMA Testbed Building a Multi-Application International Grid

Multi-Grid Resource Usage Service in LCG

MyProxy: A Multi-Purpose Grid Authentication Service

Parallel multi-grid summation for the N-body problem

Grid Service  Grid Webservice

High Productivity MPI – Grid, Multi-cluster and Embedded Systems Extensions

MyProxy: A Multi-Purpose Grid Authentication Service

Multi-Grid

Multi-Grid

Presentation Transcript

Grid, Smart grid, CURENT

High Productivity MPI – Grid, Multi-cluster and Embedded Systems Extensions

Moving grid methods and Multi-mesh methods

Multi-Site VOs and Multi-VO Sites in Open Science Grid

The Swiss Multi-Science Computing Grid AAA/SWITCH Project

Multi-Environment Software Testing on the Grid

Multi-organisation Grid Accounting System (MOGAS): PRAGMA deployment update

A Grid-enabled Multi-server Network Game Architecture

PRAGMA Grid A Multi-Application Route-Use Global Grid

Grid

Disconnected Diagrams, Multi-grid, Nvidia &amp; all that y

Portugal Grid Lisbon Grid Alentejo Grid Algarve

The PRAGMA Testbed Building a Multi-Application International Grid

Multi-Grid Resource Usage Service in LCG

MyProxy: A Multi-Purpose Grid Authentication Service

Parallel multi-grid summation for the N-body problem

Grid Service  Grid Webservice

High Productivity MPI – Grid, Multi-cluster and Embedded Systems Extensions

MyProxy: A Multi-Purpose Grid Authentication Service

Disconnected Diagrams, Multi-grid, Nvidia & all that y