1 / 22

An Apples-to-Apples GPGPU Benchmark (…or at least an attempt at one)

An Apples-to-Apples GPGPU Benchmark (…or at least an attempt at one). Peter S. Shenkin. Attachment-Based Core Hopping. What it does The architecture The benchmark. Attachment-Based Core Hopping. What it does Find a replacement for the central portion of a molecule

livana
Download Presentation

An Apples-to-Apples GPGPU Benchmark (…or at least an attempt at one)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Apples-to-Apples GPGPU Benchmark(…or at least an attempt at one) Peter S. Shenkin

  2. Attachment-Based Core Hopping • What it does • The architecture • The benchmark

  3. Attachment-Based Core Hopping • What it does • Find a replacement for the central portion of a molecule • … keeping the peripheral parts in place • … while making “chemical sense” • Why would you do such a thing? • Increase efficacy • Improve “ADMET” properties • (Absorption, Distribution, Metabolism, Excretion, Toxicity) • Find new IP • Designed as a fast interactive desktop application • The architecture • The benchmark

  4. Define Core in a “Template” Molecule • Two ways shown, to emphasize user choice 1kv1 core “1kv1-smaller” core

  5. Result: 1err: olap= 0.95 relgscore= -1.37 • Replaced C with N • Replaced S with C

  6. Result: 1erb: olap= 0.80, relgscore= -0.96 • Spiro core!

  7. Result: 1kv2: olap= 0.29, relgscore= -0.37 • Replaced O with N • Replaced N with C • Added an N • Huge shape difference!

  8. Attachment-Based Core Hopping • What it does • The architecture • Workflow engine independent of application code • (… and APU technology) • Multithreaded using Qthreads; C++ • Application stages are essentially plug-ins • The benchmark

  9. Architecture Legend Non-thread-safethread Thread-safethread CUDAthread I O Queue Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Scheduler

  10. Attachment-Based Core Hopping • What it does • The architecture • The benchmark • A truism that goes without saying • Results slowly unveiled • The dilemma & its resolution • Did we “do the right thing”?

  11. The Truism • There are lies…

  12. The Truism • There are lies… • … damn lies

  13. The Truism • There are lies… • … damn lies • … statistics

  14. The Truism • There are lies… • … damn lies • … statistics • … benchmarks

  15. The Truism • There are lies… • … damn lies • … statistics • … benchmarks • … salesmen’s claims

  16. The Truism • There are lies… • … damn lies • … statistics • … benchmarks • … salesmen’s claims … and the last two all too often interact

  17. Results Test system: • i7/930, 2.7 GHz processor • 4 physical cores, run hyperthreaded • 12 Gb RAM • 8-lane PCIe motherboard • SSD drive

  18. Results

  19. Results

  20. Results

  21. Results At constant CPU utilization: • With two GPGPUs: • Speedup = 1.07 / 0.3275 = 3.3 • With one GPGPU: • Speedup = 0.76 / 0.20 = 3.8

  22. Closing Remarks • If we did our comparisons with different number of threads, speedups would be different • If we worked on a machine with more or fewer processors, speedups would be different • If we used an 4-lane PCIe motherboard, or a different CPU, or a slower hard drive, speedups would be different • If our software architecture were different, speedups would be different • Conclusion from above: The world is a complicated place • Do you agree that our approach is fair?

More Related