1 / 13

Report on Vector Prototype

Report on Vector Prototype. J.Apostolakis , R.Brun , F.Carminati , A. Gheata 10 September 2012. Simple observation: HEP transport is mostly local !. Locality not exploited by the classical transportation approach Existing code very inefficient (0.6-0.8 IPC)

kira
Download Presentation

Report on Vector Prototype

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Report on Vector Prototype J.Apostolakis, R.Brun, F.Carminati, A. Gheata 10 September 2012

  2. Simple observation: HEP transport is mostly local ! • Locality not exploited by the classical transportation approach • Existing code very inefficient (0.6-0.8 IPC) • Cache misses due to fragmented code 50 per cent of the time spent in 50/7100 volumes ATLAS volumes sorted by transport time. The same behavior is observed for most HEP geometries.

  3. A playground for new ideas • Simulation prototype in the attempt to explore parallelism and efficiency issues • Basic idea: simple physics, realistic geometry: can we implement a parallel transport model on threads exploiting data locality and vectorisation? • Clean re-design of data structures and steering to easily exploit parallel architectures and allow for sharing all the main data structures among threads • Can we make it fully non-blocking from generation to digitization and I/O ? • Events and primary tracks are independent • Transport togethera vector of tracks coming from many events • Study how does scattering/gathering of vectors of tracks and hits impact on the simulation data flow • Start with toy physics to develop the appropriate data structures and steering code • Keep in mind that the application should be eventually tuned based on realistic numbers

  4. Volume-oriented transport model • We implemented a model where all particles traversing a given geometry volume are transported together as a vector until the volume gets empty • Same volume -> local (vs. global) geometry navigation, same material and same cross sections • Load balancing: distribute all particles from a volume type into smaller work units called baskets, give a basket to a transport thread at a time • Steering methods working with vectors, allowing for future auto-vectorisation • Particles exiting a volume are distributed to baskets of the neighbor volumes until exiting the setup or disappearing • Like a champagne cascade, but lower glasses can also fill top ones… • No direct communication between threads to avoid synchronization issues

  5. Event injection Inject event in the volume containing the IP More events better to cut event tails and fill better the pipeline ! Realistic geometry + event generator

  6. Transport model Tracks are transported to boundaries, then dumped in a track collection per thread Physics processes and geometry transport work with vectors Transport threads pick baskets “first in first out” Baskets are filled up to a threshold, then injected in a work queue Track collections are pushed to a queue and picked by the scheduler Inject events into a track container taken by a scheduler thread The scheduler holds a track “basket” for each logical volume. Tracks go to the right basket. 0 0 Track baskets (tracks in a single volume type) 1 1 Scheduler Physics processes Track containers (any volume) n n nv nv Geometry transport Event factory

  7. Preliminary benchmarks Benchmarking 10+1 threads on a 12 core Xeon HT mode Excellent CPU usage Locks and waits: some overhead due to transitions coming from exchanging baskets via concurrent queues Event re-injection will improve the speed-up

  8. Evolution of populations Flush events "Priority" regime Transporting initial buffer of events 5-9 95-99 0-4 Excellent concurrency allover Vectors "suffer" in this regime

  9. Prototype implementation Digitize & I/O thread Generate(Nevents) Worker threads generate transportable baskets pick-up baskets deque recycled baskets flush transport Dispatch & garbage collect thread Hits 0 0 Inject/replace baskets loop tracks and push to baskets ivolume 1 1 recycle basket 2 2 3 3 Inject priority baskets Priority baskets 4 4 Stepping(tid, &tracks) Digitize(iev) 5 5 Hits 6 6 7 7 8 8 recycled track collections n n Main scheduler Disk Crossing tracks (itrack, ivolume) Push/replace collection full track collections deque

  10. Buffered events & re-injection Reusage of event slots for re-injected events keeps memory under control Depletion regime with sparse tracks just few % of the job Concurrency still excellent Vectors preserved much better

  11. Next steps • Include hits, digitization and I/O in the prototype • Factories allowing contiguous pre-allocation and re-usage of user structures • MyHit *hit = HitFactory(MyHit::Class())->NextHit(); • hit->SetP(particle->P()); • Introduce realistic EM physics • Tuning model parameters, better estimate memory requirements • Re-design transport models from a “plug-in” perspective • E.g. ability to use fast simulation on per-track basis • Look further to auto-vectorization options • New compiler features, Intel Cilk array notation, ... • Check impact of vector-friendly data restructuring • Vector of objects -> object with vectors • Push vectors lower level • Geometry and physics models as main candidates • GPU is a great challenge for simulation code • Localize hot-spots with high CPU usage and low data transfer requirements • Test performance on data flow machines

  12. Outlook • Existing simulation approaches perform badly on the new computing infrastructures • Projecting this in future looks much worse... • The technology trends motivate serious R&D in the field • We have started looking at the problem from different angles • Data locality and contiguity, fine-grain parallelism, data flow, vectorization • Preliminary results confirming that the direction is good • We need to understand what can be gained and how, what is the impact on the existing code, what are the challenges and effort to migrate to a new system…

  13. Thank you!

More Related