1 / 10

Bronis R. de Supinski and John May Center for Applied Scientific Computing March 18, 1999

Bronis R. de Supinski and John May Center for Applied Scientific Computing March 18, 1999. Benchmarking pthreads. Measuring thread performance. Threading options pthreads OpenMP Threaded libraries How to choose Ease of use Performance No threads benchmark suites!!. Using SKaMPI.

tessa
Download Presentation

Bronis R. de Supinski and John May Center for Applied Scientific Computing March 18, 1999

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bronis R. de Supinski and John MayCenter for Applied Scientific ComputingMarch 18, 1999 Benchmarking pthreads

  2. Measuring thread performance • Threading options • pthreads • OpenMP • Threaded libraries • How to choose • Ease of use • Performance • No threads benchmark suites!!

  3. Using SKaMPI • Advantages • Extensible interface • Data collection and test management facilities • Will support mixed models benchmarks • Drawbacks • Times single measurement action • Relies on MPI • Design flaws • Challenges • Termination • CPU bindings

  4. Thread creation • Measuring thread creation cost is difficult • Number of threads is limited • Overhead • Solution: recursive thread creation • Using default attributes: 118.7 ms • Eventually will vary: • Detached state • Contention scope • Stack size?

  5. Context switch • Bind threads to same CPU • Repeatedly call sched_yield • Running thread alternates quickly • Measured time: 4.4 ms

  6. g Thread communication • LogP model • Latency (wire time) • Overheads • Gap • Measure • Ping-pong time: 2*(oS + L + oR) • Repeatedly “sending”: oS • Repeatedly “receiving”: oR • Calculate L • Applicability of gap? oS L oR

  7. Conditions • Operations • pthread_cond_signal • pthread_cond_wait • Associated mutex • Ping-pong measurements • Unbound: 48.9 ms (Sun: 25.0 ms) • Same CPU: 29.2 ms (Sun: 25.7 ms) • Different CPUs: 74.3 ms (Sun: 25.2 ms) • Overheads • Signal: 0.606 ms • Wait: Different CPUs: 45.7 ms

  8. Thread 0 Thread 1 Unlock 0 Lock 0 Mutexes • Ping-pong obstacles • Non-determinism • Idempotency required • Measurements • Unbound: 3.7 ms • Same CPU: 37.8 ms • Different CPUs: 3.7 ms • No contention: 0.638 ms Lock 1 Unlock 1 Unlock 2 Lock 2 Lock 3 Unlock 3 Unlock 1 Lock 1 Lock 0 Unlock 0 Unlock 3 Lock 3 Lock 2 Unlock 2

  9. Summary • SKaMPI modifications • Fills threads microbenchmark void • Can extend to mixed models • Basic aspects of pthreads performance • Creation • Context switch • Timeslice • Communication • IBM’s implementations • Mutexes are good • Conditions could be improved • Plan to add other tests, variations

  10. UCRL-MI-133178 Work performed under the auspices of the U. S. Department of Energy by Lawrence Livermore National Laboratory under Contract W-7405-Eng-48

More Related