100 likes | 221 Views
Corrupted MC data chunks. Offline weekly July 7, 2012. The issue. As reported by PWG-LF, numerous sub-jobs from LHC11b10a MC have no global tracks (back-propagated ITS tracks) Matching efficiency drop and incorrect normalization factors In the above production, the effect is 3.5%(+1.2%)
E N D
Corrupted MC data chunks Offline weekly July 7, 2012
The issue • As reported by PWG-LF, numerous sub-jobs from LHC11b10a MC have no global tracks (back-propagated ITS tracks) • Matching efficiency drop and incorrect normalization factors • In the above production, the effect is 3.5%(+1.2%) • Full report in Savannah • The effect is only in MC
Forensics • If a file (Trigger.root) is not created during the simulation phase the string of detectors in the trigger cluster are left empty and all ITS layers are skipped (no ITS tracks) • The error generates only a warning in the reconstruction • W-AliReconstruction::GetEventInfo: No trigger can be loaded! The trigger information will not be used! • The conditions for this are always in the late part of the simulation, usually, but not always, during digitisation
Forensics (2) • Two ‘events’ have been discovered so far • AliRoot aborts during a failed access to OCDB (biggest contriibutor) • Silent crash, no specific error • The AliRoot abort generates ‘Abort’ signal, which should have been printed in sim.log (redirect from standard error stream) • However in some of the cases it does not appear… • … and subsequently is not caught by the job validation script • The silent crash is not caught by any of the ‘per job’ validations
Forensics (3) • The defective jobs are not caught by • validation script – parses only *.log, not stderr/stdout • Per job CheckESD macro, successful also in the ‘corrupted’ case • The per run QA – there is a ‘hint’, but it is dissolved as the error is on ~4% level • …In addition, the mean vertex cut eliminates the events
Re-validation of the productions • Fast and indirect method – size of the sim.log LHC11b10a Good production Bad chunks, 4.9%
Re-validation of the productions (2) • Other cases and Pb+Pb LHC11b10c – not straightforward PbPb, OK period
‘Suspicious’ cycles • Tested all 2010 (149 cycles), 2011 (104 cycles), 2012 (62 cycles)
Past productions remedy • From the above table, scan rec.log for • ‘W-AliReconstruction::GetEventInfo: No trigger…’ • to positively identify affected chunks • Ongoing… • Rename the ESDs and AODs in the catalogue to ‘something else’, which will not show up in the standard analysis searches • Mild danger for analysis, which uses ‘prepared’ collections – jobs will fail… • Merged AOD (deltas) will have to be re-merged • For Pb+Pb, a cut on ‘zero ITS tracks’ will eliminate the bad chunks
Code fixes • job validation – scan all files (implemented) • per job ‘checkESD’ macro – strengthen the script, positive feedback to validate the job • QA – to be discussed • reconstruction logic – abort in case the Trigger.root file is not found • Follow-up by Offline, discussion in the weekly meetings