v5-01-Release & v5-02-Release

v5-01-Release & v5-02-Release Peter Hristov 23/01/2012

Changes: v5-01-Rev-21 • #90324: Exception in AliITStrackerMI::FollowProlongationTree. From rev. 53978 • #90549: Request to port r53948 to the release (MUON small leak fix) • #90658: For v5-01: Option to isolate heavy flavor part of a Pythia event. From rev. 53959 • #84578: Request to extend AliGenBox for using Yrange. From rev. 53996 • Optional RB/PX 24 shielding and scoring. From rev. 53955,53956

Changes: v5-01-Rev-21 • #90461: Request to port a new feature for ZDC to the release. From rev. 53705 • #90504: EVE muon_init.C update r53875 • #25142: Commit and porting to Release of the new ESD->AOD filter. From rev. 54021 • #90540: Port 53910,53911 and 53912 to the Release (Full MC Header in the AOD)

GDB on Grid • Some potential problem detected and fixed (ITS, TPC, HLT) • Some jobs fail in the beginning (event 0-10), ~4% • Not reproducible locally, even if we run many reconstruction jobs in parallel • Always caused by std_badalloc in different places • Other jobs are killed by the system (memory) ~20%

Requests/Additional fixes • #90749 ESD Porting Request: GetTPCClusterInfo with additional switch • #90743 Coverity fix in AliVCaloCells : missing assignment operator • #90738 Request to port a fix to the release in AliZDCDigitizer • #90625 Memory problem in AliTPCtrackerMI • #90622 Logic flaw in AliTPCseed • #90616 Worrying message from TPC reconstruction • Changes in RAW (TClonesArray usage)

Requests: OCDB • #90756 Request to port object in RAW OCDB (for realistic MUON simulations) • #90736 Calibration of the TRD cosmics of May,Jun and August

Other reports • #90615 Problems in the material budget, eta<0.9 and 0.9<eta<1.4

v5-02-Release • Coverity: 158 defects to be fixed • AliRoot tests: mostly OK • Root v5-32-00-patches: needs tests • PWGs transition: to be completed this week • One library per subdirectory: next week • Savannah bug reports: ongoing cleanup • Do we have any significant set of changes still missing in the trunk?

Old slides

Reconstruction of RAW (LHC11h) • Back trace problem solved • Clean-up of the PATH and LD_LIBRARY_PATH on the GRID • Clean-up of the AliEn libraries • Deterministic splitting of the failed jobs (in preparation) • New tests in parallel with the Grid production

Changes: v5-01-Rev-20 • #90319: Segmentation violation in AliPHOSRawFitterv1::~AliPHOSRawFitterv1. From rev. 53869 • #90053: Request: Port bug fix TRD calibration code to release. From rev. 53734 • #90292: Add line ConvertZDC() in AliAnalysisTaskESDfilter::ConvertESDtoAOD(). From rev. 53895 • #90307: ZDC QA update. From rev. 52738,53081,53271 • #90309: ZDC request to port code to the release. From rev. 52616 • #90024: port changes in PYTHIA6 for pyquen production (pyquen-1.5.F,CMakelib6.4.21.pkg updated), rev.53645 • #90359: Request: fix cached values in ESD. From rev. 53900 • #90013: Vertexing task crashing in trunk. From rev. 53793 • Additional protection. From rev. 53904

LHC11h Pass2 – reconstruction details • Use v5-01-Rev-19 in the production • Start in inverse time order (last runs first, “LIFO”): OK • Use MB trigger for CPass0: OK • Exercise the full production setup on runs from “grey area”: special “gdb” production, run 170593: OK • Run with TPC pools: OK • Work on a local raw file: OK • Use OCDB snapshot: OK • Keep only the rec. points for the current event: OK • Switch off QA: OK • Switch off MUON, if the memory consumption is still too high

Results • CPass0: 185 jobs, 523,509 out of 539,890 raw files successfully reconstructed => 97% efficiency • All runs with mag.field configuration (+ +) ready (170593-169628) • Details on losses follow • Pass2 current status: 131 jobs, 225,568 out of 362,790 files successfully reconstructed => 62.2% efficiency

Losses – Pass2 • G_exception – average 6.5% Strong run dependency

Losses – Pass2 (2) • Memory overrun – average 16.8% Strong run dependency Function of number of events/chunk and data taking configuration

Losses • G_exception • Debugging hard as there is no traceback • Seems to be random (from syswatch.log) • Irreproducible in local tests • No related issues shown by Valgrind • Appears in the first events of the chunks • Working with ROOT experts, at least to get the exception in the logs => special “gdb” run • Memory overrun • Additional profiling ongoing • All external sources are out – gain only possible through changes in reconstruction

Special “gdb” run • “catch throw” mode • Several problems discovered, to be submitted to Savannah. Most probably uninitialized memory is used as index in an array • TClonesArray new with placement, where the index come from GetEntriesFast • corrupted (?) raw data • deletion of arrays

Plans • Continue the investigation of G__exception on the GRID • Understand the difference between CPass0 and Pass2 (MB trigger, V0s, cascades?) • Try to reproduce completely the GRID execution flow on a local machine • Resubmit the failed jobs in “split” mode

v5-02-Release • Complete the transition of the analysis code to the new modules • Move every library to a sub-directory and get rid of *.pkg (native CMake) • Fix the Coverity defects and compilation warnings • Solve as much as possible Savannah issues • Create the branch at the end of January • First stable tag in February

v5-01-Release & v5-02-Release