40 likes | 193 Views
Corrupted RAW data chunks. Preamble. Reconstruction of run 117112 (LHC10b) Out of 47 chunks, 29 failed Out of the 29, 4 failed with std:bad_alloc All 4 were traced to RAW data chunks located at CNAF The other have failed for other reasons, also understood. Corrupted chunks.
E N D
Preamble • Reconstruction of run 117112 (LHC10b) • Out of 47 chunks, 29 failed • Out of the 29, 4 failed with std:bad_alloc • All 4 were traced to RAW data chunks located at CNAF • The other have failed for other reasons, also understood
Corrupted chunks • All show the same pattern AliEn 3b9f00b6bdb7112a884bc4ba85c54481 2938324976 CERN 3b9f00b6bdb7112a884bc4ba85c54481 CNAF e506af874dd2cd5f2625a1eed3b1b882 ** 2938324976 (same size, different MD5) /alice/data/2010/LHC10b/000117112/raw/10000117112021.340.root /04/22830/F7E39A52-500B-11DF-8001-3923A10ABEEF • Frequency • Most likely a single event (of 5500 in a chunk) is affected, in total 0.002% (2^10-5) • The above is if we assume that the corruption frequency applies for all replicated data (rather improbable)
Corrupted chunks (2) • Origin of corruption is unclear • Bad tape/error during staging • We believe the replica was OK in September 2011 (LHC10b, pass3) • Action items • The file information was sent to CNAF to investigate possible causes/origin • Based on the outcome, we may want to • If the tape is damaged, re-replicate all identified corrupted files during the reprocessing • Include a protection against corrupted events in reconstruction • Check MD5 sum before processing