1 / 5

More Charm++/TAU examples

More Charm++/TAU examples. Applications: NAMD Parallel Framework for Unstructured Meshing ( ParFUM ) Features: Profile snapshots: Captures the runtime of the application by segregating it into user specified intervals CUDA Profiling Tracks time spent in CUDA kernel routines

eman
Download Presentation

More Charm++/TAU examples

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. More Charm++/TAU examples • Applications: • NAMD • Parallel Framework for Unstructured Meshing (ParFUM) • Features: • Profile snapshots: • Captures the runtime of the application by segregating it into user specified intervals • CUDA Profiling • Tracks time spent in CUDA kernel routines • Shows scaling behavior for a experiment varying the number of devices used.

  2. Load Balancing Phases Mean Exclusive Time NAMD Snapshot Profile of over 800sec on 2048 processors enqueneSelfB Standard Deviation enqueneSelfA Main enqueneWorkB enqueneWorkA Idle

  3. NAMD CUDA events ~50% efficiency ~100% efficiency Device #0 GPU efficiency gained by doubling the number of GPU from 16 to 32. These Events are broken down by routine and by device number.

  4. NAMD CUDA scaling Scaling Efficiency Non-Bonded Calculations Sum Forces Calculations Number of Devices Scaling by event and device number, Non-Bonded Calculations scale well. Sum Forces less well but the overall time is only a few microseconds.

  5. ParFUM CUDA speedup Single CPU or GPU Performance on a 128x8x8 mesh. When run with GPU acceleration enabled ParFUM spent 9 seconds in the CUDA Kernel routines.

More Related