1 / 47

Ioana Banicescu and Mark Bilderback Department of Computer Science and

Performance Evaluation of the Parallel Fast Multipole Algorithm Using the Optimal Effectiveness Metric. Ioana Banicescu and Mark Bilderback Department of Computer Science and NSF/ERC for Computational Field Simulation Mississippi State University. Overview. Scientific Applications

ata
Download Presentation

Ioana Banicescu and Mark Bilderback Department of Computer Science and

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Evaluation of the Parallel Fast Multipole Algorithm Using the Optimal Effectiveness Metric Ioana Banicescu and Mark Bilderback Department of Computer Science and NSF/ERC for Computational Field Simulation Mississippi State University

  2. Overview • Scientific Applications • Performance Evaluation • Scalability Analysis • Optimal Effectiveness Metric • Parallel Fast Mutipole Algorithm • Experimental Results • Conclusions and Future Work

  3. Scientific Applications • Large, computationally intensive, irregular • Parallel Implementation (various algorithms) • Performance degradation factors • Communication and load imbalance • architecture independent • architecture dependent

  4. Architecture Independent Factors • Problem characteristics • nonuniformity of input data • Algorithmic • serial section • communication patterns • local / non-local dependencies

  5. Architecture Dependent Factors • Architectural charateristics • Language, OS • Interconnected Network • Characteristics of each component processor • speed, memory, etc.

  6. Performance Evaluation • Parallel Applications • Scalability • algorithm, architecture, mapping • Evaluation • Isolated to particular applications • Different types of performance metrics • Performance metric characteristics • Relevant, consistent, quantitative, predictive

  7. Performance Metrics • Commonly used (time, speedup, efficiency, cost) • Speedup [Amdahl ‘67] • Scaled Speedup [Gustafson ‘88] • Fixed time size-up [Sun and Gustafson ‘91] • Isoefficiency [Gupta & Kumar ‘93] • Optimal effectiveness [Luke, Banicescu, Li ‘98]

  8. Isoefficiency • Algorithms that can add processors at faster rate are able to achieve higher performance. • Does not identify the number of processors required before an algorithm becomes an effective option. • It discounts valuable parallel algorithms for which an isoefficiency does not exists.

  9. Performance - Cost Tradeoffs • High performance application seek performance-cost balance. • Scalability analysis - theoretical, experimental. • Optimal effectiveness [Luke, Banicecsu, Li ‘98] • Similar to (E*S)max [Tang, Li ‘90] • Asymptotic relationship between isoefficiency and (E*S)max

  10. Optimal Effectiveness • Cost Effectiveness: • Optimal Effectiveness:

  11. Optimal Effectiveness (contd.) • Compare the performance of different parallel algorithms. • Identify specific conditions of problem size and number of processors that characterize crossover points and intervals where one algorithm becomes more cost effective than another. • Prescribe the number of processors that are relevant to particular problem size: Popt.

  12. The N-body Problem Resulting force • Problem: Simulate the evolution of N particles over time (given initial positions and velocities) • Compute new positions and velocities of the N particles after one time step • Applications: astrophysics, molecular dynamics • Naive algorithm: O(N2)

  13. Approximation Algorithms • O(N) [Appel85] • O(NlogN) [Barnes-Hut86] • O(N) Fast Multipole Algorithm (FMA) [Greengard87]a • Particles interaction approximation within a specified accuracy (Zhao, Board, Pringle,..) • O(N) Adaptive Fast Multipole Algorithm (AFMA) [Greengard87]b • Singh et al., Nyland et al., etc

  14. The Greengard Algorithm • Two traversals: • upward • downward • 2D: Quad-tree • 3D: Oct-tree

  15. group of particles evaluation point well-separated equivalent particle Traversing the Tree Upwards • Computing combined field • effects of particles • in regions • Multipole expansion

  16. Traversing the Tree Downwards Higher level Lower level

  17. Implementation • 3D-PFMA, LB[Duke], Fractiling • KSR-1, IBM-SP2, SuperMSPARC • Pthreads, MPI • Uniform, Nonuniform (Gaussian, Corner) • 4 - 64 processors, 1k - 100k particles

  18. 3-d Cost: nonuniform (corner)(KSR1) • Densely packed (50K5) • Lightly packed (50K6) Cost in seconds Number of processors • LB better 4-16 proc

  19. 3-d Cost (IBM-SP2)

  20. 3-d Cost (SuperMSPARC)

  21. Optimal Effectiveness(KSR-1)

  22. Optimal Effectiveness(KSR-1)

  23. Optimal Effectiveness(KSR-1)

  24. Optimal Effectiveness(KSR-1)

  25. Optimal Effectiveness(KSR-1)

  26. Optimal Effectiveness(KSR-1)

  27. Optimal Effectiveness(KSR-1)

  28. Optimal Effectiveness(IBM-SP2)

  29. Optimal Effectiveness(IBM-SP2)

  30. Optimal Effectiveness(IBM-SP2)

  31. Optimal Effectiveness(IBM-SP2)

  32. Optimal Effectiveness(SuperMSPARC)

  33. Optimal Effectiveness(SuperMSPARC)

  34. Optimal Effectiveness(SuperMSPARC)

  35. Optimal Effectiveness(SuperMSPARC)

  36. Optimal Effectiveness(SuperMSPARC)

  37. Optimal Effectiveness(SuperMSPARC)

  38. Optimal Effectiveness(SuperMSPARC)

  39. Optimal Effectiveness(SuperMSPARC)

  40. Optimal Effectiveness(SuperMSPARC)

  41. Optimal Effectiveness(SuperMSPARC)

  42. Cost vs. Cost Effectiveness • 10k nonunioform corner • Fractiling cost < LB cost < PFMA cost (regardless of number of processors). • The IDEAL number of processors to use for a cost effective execution is unknown. • Allocate only Popt number of processors and leave the rest for other simultaneously executing applications.

  43. Cost

  44. Optimal Effectiveness

  45. Conclusions • Cost effectiveness analysis - novel approach. • Qualitative and quantitative characteristics. • Optimal effectiveness derived from cost effectiveness curves. • Measurement of Γopt give the exact number of processors relevant to particular problem size.

  46. Conclutions (contd.) • Cost effectiveness / Optimal effectiveness: • Quantifies specific conditions that make a particular algorithm optimal. • Capability to compare any set of algorithms regardless of the existence of the isoefficiency. • Γopt shows the point at which using one of the algorithm is more advantageous than using another.

  47. Conclutions (contd.) • Cost effectiveness / Optimal effectiveness: • Allows intelligent allocation of available processors to other applications. • Improved throughput for the entire system. • Captures the impact and tradeoff in complexity of the conditions that dictate performance.

More Related