10 likes | 108 Views
B. C. M. 3. CBM track model: x, y – coordinates t x - slope dx/dz t y - slope dy/dz q – charge p – momentum. 2. 1. Cores. HW Threads. SIMD width. Computer: 2 CPUs X5550 4 cores per CPU Hyper-Threading. 2.7 GHz 8 MB L3 cache 48 GB RAM.
E N D
B C M 3 CBM track model: x, y – coordinates tx - slope dx/dz ty - slope dy/dz q– charge p– momentum 2 1 Cores HW Threads SIMD width • Computer: • 2 CPUs X5550 • 4 cores per CPU • Hyper-Threading • 2.7 GHz • 8 MB L3 cache • 48 GB RAM Deutsche Physikalische Gesellschaft e.V. Bonn 10 Parallel Kalman Filter Track Fit Based on Vector Classes I. Kisel, M. Kretz, I. Kulakov (for the CBM Collaboration) GSI Helmholtzzentrum für Schwerionenforschung GmbH, Darmstadt, Germany; E-mail: I.Kulakov@gsi.de Kalman Filter Based Track Fit Tracking Challenge The Kalman filter method is intended for estimation of the state vector r according to the measurements mk Simulated central Au-Au collision at 25 AGeV Stages of track fit • Fixed-target heavy-ion experiment • 107 events/s • 1000 charged particles/collision • Non-homogeneous magnetic field • Track reconstruction and displaced vertex search required in the first trigger level Initial segment Extrapolation to the next hit Filtration of the next hit Block diagram of the Kalman filter method. - S. Gorbunov, U. Kebschull, I. Kisel, V. Lindenstruth, W.F.J. Muller, Fast SIMDized Kalman filter based track fit, Comp. Phys. Comm. 178 (2008) 74-383. Parallelization Vector Classes Library for SIMD Instructions • SSE, …, SSE4_1, LRBni • Load and store • Gather and scatter • sfloat_v for float and short • Arithmetic and comparison • Math functions • Masks • Masked operations Very fast track fitting algorithm is essential for the feasibility of the open charm event selection • Horizontal operations • Constants • Type-safety • Up to 109 tracks/s in the Silicon Tracker • Develop the algorithm, which exploits the full potentialof modern processors 3 dimensions of parallelism SIMDparallelization (Data level parallelism) Parallelization between cores (Task level parallelism) Vc arithmetic Vc math operations - Vector Classes, (http://gitorious.org/vc). Quality and Timing Many-Core Scalability Residuals – errors: and pulls – normalized residuals: core 1 core 2 core 3 Intel Threading Building Blocks (TBB) software has been used for parallelization between cores on 8 cores computer SIMD KF track fit scalability on 8 cores computer Speed up of factor 8.5 has been obtained Strong many-core scalability for large groups of tracks - Intel Threading Building Blocks, (http://www.threadingbuildingblocks.org).