110 likes | 204 Views
Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment. Ivan Kisel GSI, Germany (for the CBM Collaboration). CHEP-2010 Taipei, October 18-22, 2010. Tracking Challenge in CBM (FAIR/GSI, Germany). Fixed-target heavy-ion experiment 10 7 Au+Au collisions/s
E N D
Many-Core Scalabilityof the Online Event Reconstructionin the CBM Experiment Ivan Kisel GSI, Germany (for the CBM Collaboration) CHEP-2010 Taipei, October 18-22, 2010
Tracking Challenge in CBM (FAIR/GSI, Germany) • Fixed-target heavy-ion experiment • 107Au+Au collisions/s • 1000 charged particles/collision • Non-homogeneous magnetic field • Double-sided strip detectors • (85% combinatorial space points) • Track reconstruction in STS/MVD and displaced vertex search • are required in the first trigger level. • Reconstruction packages: • track finding Cellular Automaton (CA) • track fitting Kalman Filter (KF) • vertexing KF Particle Ivan Kisel, GSI
Many-Core CPU/GPU Architectures NVIDIA GPU Intel/AMD CPU 512 cores 2x4 cores Intel MICA IBMCell 32 cores 1+8 cores Future systems are heterogeneous Ivan Kisel, GSI
Kalman Filter (KF) based Track Fit Track fit: Estimation of the track parameters at one or more hits along the track – Kalman-Filter (KF) 3 Correction Detectorlayers Hits 1 Initialising p (r, C) r – Track parameters C – Precision Precision State vector Position, direction and momentum 2 Prediction r = { x, y, z, px, py, pz } KF Block-diagram 1 Kalman Filter: Start with an arbitrary initialization. Add one hit after another. Improve the state vector. Get the optimal parameters after the last hit. 3 2 Nowadays the Kalman Filter is used in almost all HEP experiments KF as a recursive least squares method Ivan Kisel, GSI
Performance of the KF Track Fit on CPU/GPU Systems 2xCell SPE (16 ) Woodcrest ( 2 ) Clovertown ( 4 ) Dunnington ( 6 ) Scalabilty 10.00 Task Level Parallelism (100x) Data Stream Parallelism (10x) Time/Track, ms 1.00 0.10 Cores and Threads SIMD 0.01 scalar single -> double 8 2 4 32 16 Threads Scalability on different CPU architectures – speed-up 100 GPU CPU Threads Cores Real-time performance on NVIDIA GPU graphic cards Real-time performance on different Intel CPU platforms SIMD The Kalman Filter algorithm performs at ns level Ivan Kisel, GSI
Scalability of the KF Track Fit on 2xNehalem = 8 Cores Each physical core of Nehalem has 2 HW threads (read/exe), which are considered by the operating system as logical cores. Aiming to develop a scheduler, we investigate scalability and CPU load. Measure tracks throughput rather than time per track. Given n threads each filled with 10m tracks, run them on specific n logical cores with 1 thread per 1 logical core. For small groups of tracks the overhead becomes significant, while large groups of tracks use CPU more efficient. Core 1 Core 2 Ivan Kisel, GSI
Cellular Automaton (CA) as Track Finder 1. Segments 2. Counters 1 2 4 3 4. Tracks 3. Track Candidates Track finding: Wich hits in detector belong to the same track? – Cellular Automaton (CA) 0. Hits Detector layers 0. Hits (CBM) Hits Cellular Automaton: Build short track segments. Connect according to the track model, estimate a possible position on a track. Tree structures appear, collect segments into track candidates. Select the best track candidates. 1000 Hits 4. Tracks (CBM) • Cellular Automaton: • local w.r.t. data • intrinsically parallel • extremely simple • very fast • Perfect for many-core CPU/GPU ! 1000 Tracks Ivan Kisel, GSI
CBM Cellular Automaton Track Finder Front view 770 Tracks Top view Intel X5550, 2x4 cores at 2.67 GHz Scalability Efficiency Highly efficient reconstruction of 150 central collisions per second Ivan Kisel, GSI
CA Track Finder: First 4D Version The beam will have no bunch structure, therefore 4D measurements (x, y, z, t). Reconstruction of time slices rather then events will be needed. • 100 mbias events have been divided into groups, each of them has been processed as a single event. • Each hit had an integer index (equal to the event number, later will represent the time measurement t ). • Each tracklet was constructed from hits with the same t. • The same efficiency. • Slight increase of the processing time with larger time slices. Ready for more realistic 4D simulations Ivan Kisel, GSI
Consolidate Efforts: International Tracking Workshop Follow-up Workshop: 10-11 March, 2011,CERN SverreJarp 45 participantsfrom Austria, China, Germany, India, Italy, Norway, Russia, Switzerland, UK and USA Ivan Kisel, GSI
Summary • CBM event reconstruction is at the ms level • Strong many-core scalability • Data parallelism vs. task parallelism • Use SIMD unit, multi-threading, many-cores • Parallelization is now becoming a standard in the CBM experiment Ivan Kisel, GSI