110 likes | 188 Views
Speed-up of the ring recognition algorithm. Semeon Lebedev GSI, Darmstadt, Germany and LIT JINR, Dubna, Russia Gennady Ososkov LIT JINR, Dubna, Russia. Fast algorithm -> less computers requirements Possibility to use on-line reconstruction Many cores CPUs -> algorithms can be parallelized.
E N D
Speed-up of the ring recognition algorithm Semeon Lebedev GSI, Darmstadt, Germany and LIT JINR, Dubna, Russia Gennady Ososkov LIT JINR, Dubna, Russia
Fast algorithm -> less computers requirements Possibility to use on-line reconstruction Many cores CPUs -> algorithms can be parallelized Motivation I. Kisel, March 2009, CBM Coll. Meeting
Ring recognition algorithm Standalone ring finder. Time consumption Two steps: Local search of ring-candidates,based on local selection of hits and Hough Transform. 99% Global search. Filter: algorithmcompares all ring-candidatesand chooses only good rings, rejecting clone and fake rings. 1%
Ring recognition algorithm, local search Preliminary selection of hits Histogram of ring centers Hough Transform Remove hits of found ring(only best matched hits) Ring quality calculation Ring array Ellipse fitter
Time consumption Define local area and hits HoughTransform Peak finder 1% 30% 69% Time consumption • peak finding in 2D and 1D array • Hits search • Arrays initialization • Triple loop of ring parameters • calculation Where? Optimize hits search and arrays sizes and dimensions, remove dynamic memory allocation Optimize calculations inside loops, decrease combinatory
Divide hits into a several parts Make Hough Transform of each part independently Optimization of Hough Transform First part of hits Second part of hits Hough Transform Hough Transform Sum up histogram
Optimization of Hough Transform: SIMD and SSE2 Algorithm must work with single precision type (float) 128 bit register Four concurrent add operations SSE 128-bit registers can represent: • sixteen 8-bit signed or unsigned chars, • eight 16-bit signed or unsigned shorts, • four 32-bit integers, or •four 32-bit floating point variables.
SIMD version of CalculateRingParameters(x[3], y[3], &xc, &yc, &r) was implemented. Ring finder and SIMD CalculateRingParameters(x[3], y[3], &xc, &yc, &r), where x, y, xc, yc, r are floats CalculateRingParameters(xv[3], yv[3], &xcv, &ycv, &rv), where xv, yv, xcv, ycv, rv are F32vec4
Optimization and performance Speed up factor: 7.7 Processor Intel Pentium Core2 6400 2.13 GHz
Electron ring finding efficiency Au-Au central collision at 25 AGev plus 5e+ and 5e- Compact RICH geometry
Ring finder was significantly optimized in terms of calculation speed without loosing an efficiency Next step: HT parameters optimization Parallelization on multi core CPU Continue investigation of SIMD version Summary