140 likes | 266 Views
CA+KF Track Reconstruction in the STS. I. Kisel GSI / KIP. CBM Collaboration Meeting GSI, February 28, 2008. Track Finder: what is the next Step?. High track density Non-homogeneous magnetic field Fake space points are dominated Single-sided strip detectors Detector inefficiency
E N D
CA+KF Track Reconstructionin the STS I. Kisel GSI / KIP CBM Collaboration Meeting GSI, February 28, 2008
Track Finder: what is the next Step? • High track density • Non-homogeneous magnetic field • Fake space points are dominated • Single-sided strip detectors • Detector inefficiency • Not perfectly aligned system • On-line event selection • Large PC farm • Optimize the STS geometry (strips, sector navigation) • Mathematical and computational optimization • SIMDization of the algorithm (from scalars to vectors) • MIMDization (multi-threads, multi-cores) Ivan Kisel, GSI
Data Acquisition System RU RU RU RU RU RU RU RU RU RU RU RU RU RU RU RU 50 kB/ev Detector 107 ev/s 100 ev/slice MAPS STS RICH TRD ECAL SFn Dt SFn Dt SFn Dt SFn Dt SFn Dt Event Builder Network N x M Scheduler SFn Dt MAPS STS RICH TRD ECAL SFn available Farm Control System 5 MB/slice Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm 105sl/s PC Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm Sub-Farm 10? PCs Ivan Kisel, GSI
Cell Blade – a Sub-Farm with (2+16) Cores Sub-Farm PC PC PC PC PC FPGA FPGA FPGA FPGA Tracking and Vertexing Units Sub-Farm Management Unit Sub-Farm Decision/Selection Unit Ivan Kisel, GSI
Welcome to the Era of Multicore HPC Gaming STI: Cell ? GP GPU Nvidia: Tesla GP CPU Intel: Larrabee CPU/GPU AMD: Fusion • High performance computing (HPC) • Highest clock rate is reached • Performance/power optimization • Heterogeneous systems of many (>8) cores • Similar programming languages (Ct and CUDA), but standards are unlikely • We need a uniform approach to all CPU/GPU families • How to take advantage of the additional cores? Ivan Kisel, GSI
NVIDIA GeForce 9600 GT GPU: 64 Cores • 64 processors • 1.625 GHz frequency • double precision (?) • 170 EUR price Ivan Kisel, GSI
Intel Polaris: 80 Cores 3.16 GHz, 0.95 Volt, 62 Watt -> 1.01 Teraflops Ivan Kisel, GSI
Cell Processor: 1+8 Cores Ivan Kisel, GSI
Computer Physics Communications 178 (2008) 374-383 Ivan Kisel, GSI
Speed-up of the Kalman Filter Track Fit Ivan Kisel, GSI
Structure and Data: a Bottleneck • A standalone L1Algo module • About 300 kB per central event cbmroot/L1 L1Tracks L1Event (L1Strips, L1Hits) L1Geometry L1Algo Input: Strips: floatvStripValues[NStrips]; // strip coordinates (32b) unsigned charvStripFlags [NStrips]; // strip iStation (6b) + used (1b) + used_by_dublets (1b) Hits: struct L1StsHit { unsigned short int f, b; // front (16b) and back (16b) strip indices }; L1StsHitvHits[NHits]; unsigned short intvRecoHits [NRecoHits]; // hit index (16b) unsigned charvRecoTracks [NRecoTracks]; // N hits on track (8b) class L1Triplet{ unsigned short intw0; // left hit (16b) unsigned short intw1; // first neighbour (16b) or middle hit (16b) unsigned short intw2; // N neighbours (16b) or right hit (16b) unsigned charb0; // chi2 (5b) + level (3b) unsigned charb1; // qp (8b) unsigned charb2; // qp error (8b) } Output: Internal: Ivan Kisel, GSI
Parallelization of the CA Track Finder Create tracklets Collect tracks 1 2 GSI, KIP, CERN Ivan Kisel, GSI
Kalman Filter Track Fit on Multicore Systems: Multithreading Real fit time/track (us) Logarithmic scale! #tasks Håvard Bjerke Ivan Kisel, GSI
Summary and Plans • SIMDized CA track finder works well • Work on single-sided strip detectors started • Multithreaded Kalman filter track fit • Learn Ct (Intel) and CUDA (Nvidia) programming languages • Investigate large multi-core systems (CPU and GPU) • Parallelize the CA track finder • Parallel hardware -> parallel languages -> parallel algorithms Ivan Kisel, GSI