1 / 11

U pdate on G5 prototype

U pdate on G5 prototype. Andrei Gheata Computing Upgrade Weekly Meeting 26 June 2012. Simple observation: HEP transport is mostly local !. 50 per cent of the time spent in 50/7100 volumes. ATLAS volumes sorted by transport time. The same behavior is observed for most HEP geometries.

symona
Download Presentation

U pdate on G5 prototype

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Update on G5 prototype Andrei Gheata Computing Upgrade Weekly Meeting 26 June 2012

  2. Simple observation: HEP transport is mostly local ! 50 per cent of the time spent in 50/7100 volumes ATLAS volumes sorted by transport time. The same behavior is observed for most HEP geometries.

  3. A playground for new ideas • Simple simulation prototype to help exploring parallelism and efficiency issues • Basic idea: minimal physics to start with, realistic geometry: can we implement a parallel transport model on threads exploiting data locality and vectorisation? • Clean re-design of data structures and steering to easily exploit parallel architectures • Can we make it fully non-blockingfrom generation to digitization and I/O ? • Events and primary tracks are independent • Work chunk: basket containing a vector of tracks • Mixing tracks from different events to avoid tails and have reasonably-sized vectors • Study how does scattering/gathering of vectors impact the simulation data flow • Toy physics at first, more realistic EM & hadronic processes to continue with • The application should be eventually tuned based on realistic numbers • New transport model more “detector element”-oriented, profiting from the cached data structures • geometry and x-section wise • Where to go from there • Re-design the particle stack and the I/O • Re-design transport models from a “plug-in” perspective • E.g. ability to use fast simulation on per track basis • Understand what can be gained and how, what is the impact on the existing code, what are the changes and effort to migrate to a new system…

  4. Volume-oriented transport model • We implemented a model where all particles traversing a given geometry volume are transported together as a vector until the volume gets empty • Same volume -> local (vs. global) geometry navigation, same material and same cross sections • Load balancing: distribute all particles from a volume type into smaller work units called baskets, give a basket to a transport thread at a time • Particles exiting a volume are distributed to baskets of the neighbor volumes until exiting the setup or disappearing • Like a champagne cascade, but lower glasses can also fill top ones… • No direct communication between threads to avoid synchronization issues

  5. The beginning Inject event in the volume containing the IP More events better to cut event tails and fill better the pipeline ! Realistic geometry + event generator

  6. A first approach Each thread transports its basket of tracks to the boundaries of the current volume Move crossing tracks to a buffer, then picks-up the next basket from the queue Transport threads pick-up baskets from the work queue Work queue Particles(i0,…,in) Physics processes Geometry transport Particles(i0,…,in) Scatter all injected tracks to baskets. Only baskets above some threshold are transported. Physics processes and geometry transport called with vectors of particles

  7. First version required synchronization… Recompute work chunks and start transporting the next generation of baskets Work queue POP_CHUNK FLUSH ParticleBuffer QUEUE_EMPTY Generation = Pop work chunks until the queue is empty Synchronization point: flush transported particle buffer and sort baskets according content

  8. Processing phases ideal Initial events injection • Sparse regime • More and more frequent garbage collections • Less tracks per basket • Depletion regime • Continuous garbage collection • New events needed • Force flushing some events • Optimal regime • Constant basket content Garbage collection threshold

  9. Prototype implementation Digitize & I/O thread Generate(Nevents) Worker threads generate transportable baskets pick-up baskets deque recycled baskets flush transport Dispatch & garbage collect thread Hits 0 0 Inject/replace baskets loop tracks and push to baskets ivolume 1 1 recycle basket 2 2 3 3 Inject priority baskets Priority baskets 4 4 Stepping(tid, &tracks) Digitize(iev) 5 5 Hits 6 6 7 7 8 8 recycled track collections n n Main scheduler Disk Crossing tracks (itrack, ivolume) Push/replace collection full track collections deque

  10. Evolution of populations Flush events 5-9 95-99 0-4

  11. Preliminary benchmarks Benchmarking 10+1 threads on a 12 core Xeon HT mode Excellent CPU usage Locks and waits: some overhead due to transitions coming from exchanging baskets via concurrent queues Event re-injection will improve the speed-up

More Related