1 / 16

COMP60611 Fundamentals of Concurrency

This lab exercise explores the performance model of a finite difference method used in atmospheric simulation, considering factors such as grid size, computational time, communication costs, and efficiency.

jsharon
Download Presentation

COMP60611 Fundamentals of Concurrency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP60611Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley, John Gurd Centre for Novel Computing School of Computer Science University of Manchester

  2. Example: Global Atmosphere model • Consider a three dimensional model of the atmosphere • The model computes values of key atmospheric variables such as temperature, wind speed, pressure and moisture content. • The physical processes involved in the atmosphere are described by a set of partial differential equations, in this case describing the basic fluid dynamical behaviour.

  3. Numerical model • The behaviour of the equations in a continuous space is approximated by their behaviour on a finite set of regularly spaced grid points in the space. • The equations are integrated, from an initial state, in time using a fixed, discrete timestep, typically, 20 mins. • The grid points are located on a rectangular latitude-longitude-hight grid of size N_x by N_y by N_z. • There are usually around 30 levels in the atmosphere model (N_z = 30). • N_x (latitude points) is usually less than N_y (longitude) with typical values for N_y being in the range 100-500 (low to high resolution). Models may cover a limited area (limited area model, LAM) of the globe or the entire globe (global circulation model, GCM). 500 grid points on the equator corresponds to a grid-spacing of approximately 55 miles.

  4. Dynamics and physics • We assume the model uses a finite difference method to update grid values, with a five-point stencil in the horizontal (x- and y-directions) to compute atmospheric motion, and a three-point stencil in the vertical (z-direction) • The finite difference computations are concerned with the movement, or dynamics, of air in the atmosphere. • In additions to the dynamics, the atmosphere model includes algorithms to simulate various physics processes, such as radiation, convection and precipitation. • The data dependencies in physics calculations are normally (in most models) restricted to within vertical columns, by design of the modelling equations.

  5. The finite difference stencil at a point z, height y, longitutde x, latitude

  6. Finite difference example – part 1 • Assume a grid of N  N  Z grid points. • Note that, in this case, the parameter Ndefines the problem size, but is not actually the problem size itself. • Consider first a 1D partition in the horizontal plane (in longitude) so that each task computes N  N/P  Z grid points per timestep • (we only consider the cost of one timestep since, in this problem, all timesteps are assumed equivalent). • Thus, the total computation time for one timestep is: • Where tc is the (average) time of computation for one grid point • Assuming all processors are the same!

  7. 1D partition in the horizontal latitude Proc 1 Proc 2 longitude

  8. Communication and idle costs • The stencil is a 5-point stencil, so each task will exchange a total of NZ points with each of two neighbours • Note we assume cyclic boundary conditions • This gives a total communication cost of: • ts – comms startup cost, tw – cost per ‘word’ to transmit message • If we assume P divides N, there will be no idle time

  9. Total cost (i.e. the model) • The total cost is then given by (assuming no idling): i.e. • Now, what can we do with this model?

  10. Performance metrics: Speed-up and Efficiency - reminder • Define relative speedup as the ratio of the execution time on one processor to that on P processors: • Define relative efficiency as: • This is the fraction of time that processors spend doing useful work (i.e., the time spent doing useful work divided by total time on all processors) • It characterises the effectiveness of an algorithm on a system • For any problem size and any number of processors

  11. Observations on the model • Execution time decreases with increasing P • Good! • But bounded from below by the cost of exchanging (two) array slices • Implies a limit on the execution time regardless of P • Execution time increases with increasing N, Z, tc, ts and tw

  12. Further observations • Once you have an explicit expression for relative efficiency, Note: • Relative efficiency decreases with increasing P, ts and tw • Relative efficiency increases with increasing N, Z and tc • The implications will be explored in the lab. • Relative speedup is of limited use. • Alternatively, define speedup relative the time of the best known sequential algorithm (executing on the same machine). See the paper “Twelve ways to fool the masses when giving performance results on parallel computers” by Bailey, Supercomputing Review, Aug. 1991, on misuses of speedup.

  13. Absolute performance metrics • Relative speed-up can be misleading! (Why?) • Define absolute speed-up (efficiency) with reference to the sequential time of an implementation of the best known algorithm for the problem-at-hand: Tref • Note: the best known algorithm may take an approach to solving the problem different to that of the parallel algorithm

  14. Finite differences example – part 2 • Next we consider a 2D partition of the horizontal domain (partitioning both latitude and longitude)… • P processors in total in a square decomposition • The number of grid points each task computes is now: ? (derive this…) • Each task will exchange ? (derive this…) grid points with each of ? neighbours at each timestep

  15. Full 2D model • The total cost for the 2D model is then: ?

  16. What does the 2D model tell us? • How does it compare with the 1D case? • In terms of performance and scalability • This will be the basis of the lab exercise

More Related