1.11k likes | 1.14k Views
This version includes various porting modifications, bug fixes, and requests, aiming to enhance memory consumption, event reconstruction speed, fixes for crashes, and addressing serious bugs affecting physics. The release also features updates related to TPC components and gain maps. The focus is on maintaining performance and stability for physics data processing.
E N D
v5-01-Release P. Hristov 19/12/2011
Changes: v5-01-Rev-17 • #89676 Porting modifs in AliGenpythia for PYQUEN usage to release, 53388 • #89731 Port request: ZDC timing cut default values • #89772 Request to port PHOS trigger info reconstruction code to the Release. From rev. 52412,53508,53524,53553,53577 • #89357 commit in STEER and port to release T0 AOD. From rev. 53616 • #89816: Hardware cables swapping discovered for HMPID chamber 2. From rev. 53586 • #89819: Request to port bugfix in AliHLTTPCHWClusterMerger to v5-01-Release. From rev. 53494
Changes: v5-01-Rev-17 • #89840 Request to port to release TPC code - for pass2. From rev. 53471 • #89860: TRD request :: New addition to ESD track. From rev. 53584, 53643 • #89887: Detector status ULong_t in AliESDEvent. From rev. 53609,53610 • #89872: Request: Port bug fix TRD calibration code to release. From rev. 53593 • #89822: Provide default track cuts for AOD production (LHC11h). From rev. 53104,53576,53604,53608,53619 + 53654 • #89817: Commit calorimeter AOD. From rev. 53658
Changes in v5-01-Rev-17. #88914: Very high memory consumption in reco of 2011 Pb+Pb • Rev. 53510, 53511: bug fix in AliESDTagCreator + method to merge tags. • Rev. 53512 : possibility to delete the recpoints, digits after each event reco. • Rev. 53513 : possibility to safely stop reco if memory exceeds specified limits. • Rev. 53583 : Bug fix + keeping the clusters in the TClonesArrays
Changes: v5-01-Rev-18 • #89822: Provide default track cuts for AOD production (LHC11h). From rev. 52345,52488,52523,52536 • AliCentralitySelectionTask: Update for LHC11h pass2. From rev. 53675 • Fix for memory corruption in PWG3/vertexerHF
Changes: OCDB • #89724 upload in alien OCDB T0 time-amplitude calibration • #89777 Please put PHOS trigger object to OCDB
LHC11h Pass2 – processing strategy CPass0 + calib train on all MB triggers (possibly chain of RAW) > 90% job succes OCDB snaphsot Merging+OCDB update If TPC OK Pass 2 OCDB snapshot #2 QA merging AOD merging
LHC11h Pass2 – reconstruction details • Start in inverse time order (last runs first, “LIFO”): OK • Check RAW data chain for CPass0 (with MB trigger): problem in the file merging, under investigation (high memory consumption: Rev-16 cannot merge output of Rev-18, exhausted disk space: big log files) • Exercise the full production setup on runs from ‘grey area’ • Need list of runs to process, QA filtered ‘bad runs’ • Run with TPC pools: OK • Work on a local raw file: OK • Use OCDB snapshot: OK • Keep only the rec. points for the current event: OK • Switch off QA: OK • Switch off MUON, if the memory consumption is still too high
Requested changes • Bug fixes • #87875: Big memory leak in AOD • #89822: Provide default track cuts for AOD production (LHC11h) • #90007: Request to commit/port fix in ANALYSIS/AliFileMerger.{h,cxx} • OCDB • #90005: Request to update TPC gain maps in Raw OCDB for pass 2 Pb-Pb production
Other reports (12/12/2011) • #89782 Set up test system for the event display • #89781 Per object quality instead of general quality flag • #89730 Flag "fIsMisaligned" in AliCluster set incorrectly during/after reconstruction for (at least) TRD
Changes: v5-01-Rev-16 • #89427 Porting modification for material budget issues to the Release. 53274,51506,53406,53427 • #89536 Request to port 53389: unnecessary cloning of AliESDtrack in AliCascadeVertexer • #89558 port to Release AliT0QADataMakerRec.cxx. From rev. 53403 • #89581 Request to port update of HLT TPC components to v5-01-Release. From rev. 52685,52776,52872,52874,52996,53052,53054 • #89582 Request to port merging of HLT TPC clusters at readout branch-borders to v5-01-Release. From rev. 53173,53179,53180,53226,53333,53342,53378,53380,53393,53394,53404 + 53181,53447,53448,53449,53450,53451 • #89586 Request to port new classes for automatic emulation of HLT TPC compression to v5-01-Release. From rev. 52995,53042,53110,53132
Changes: v5-01-Rev-16 • #89652: Request to update HLT/ChangeLog of v5-01-Release • #89601: Request to port AliPHOSRawDigiProducer to release. From rev. 53421 • Added histos for other triggers and flag for filling the histos. From rev. 53459 • Correct treatment of negative values. From rev. 53422 • #89681: Request to port commits 53249 and 53472 (ConfigVertexingHF_highmult.C) to the v5-01-Release
Message 05/12/2011 • Dear colleagues, • As discussed during the Physics board meeting last Thursday and today at the weekly offline • meeting, we will freeze v5-01-Release. Today is the last set of "general purpose" changes that • was discussed and accepted. The only changes we will consider in v5-01-Release from now • on are related to: • reduced memory consumption; • - improved time to reconstruct events; • fixes for crashes; • fixes for serious bugs in the reconstruction that affect physics (if any); • changes needed by the online processing; • updates of OCDB objects. • We will not accept changes in these particular parts of AliRoot that were heavily modified • during the recent weeks: • QA; • calibration algorithms; • analysis. • The goal is to have a stable version of AliRoot for the Christmas production by 16/12/2011. • The new release v5-02-Release is expected at the end of January, so the new development • will be taken from the SVN trunk.
Blockers for the Christmas production • Very high memory consumption: (3.5Gb RAM, 4.5Gb virtual): see next slide • Increased time to reconstruct one event (more central events this year) • solved by the changes in the cascade finder • Irreproducible crashes (G__exception): now reproduced, under investigation • “Cured” after resubmission => point to memory corruption problems; • reduce significantly the efficiency • seems to be related to the huge TPC.RecPoins.root file • #89651 Split HLT TPC clusters at readout branch border have impact to dca of high pt tracks: under investigation?
Very high memory consumption • Virtual memory improved by the “memory pools” of Ruben (TPC): in production • Still too high resident memory • Additional “pools”, to be committed to the trunk • tcmalloc doesn’t help on SLC5: LD_PRELOAD problem, to be repeated with linked libraries • xrootd studies: local files vs xrootd access
Very high memory consumption • Too big files with rec. points: option to keep only the current (last) event: needed if we test reconstruction of local raw file • Switch off QA and MUON reconstruction • Size of the libraries: loadlibs.C => libITSrec.so takes ~80Mb • no obvious reason • AliITSclustererV2.cxx: static Short_t pairs[1000][1000]; • Rec. points: split mode? • Option and macros and scripts to reconstruct one big chunk in several consecutive aliroot processes + merging of the ESDs, ESDfriends and tags => “last resort” • Test of event ordering with full pools and local raw+OCDB • More profiling…
Changes: v5-01-Rev-15 • #24466: Prepare a Geant 4 production request. From rev. 53143 • #89265: VZERO equalization factor not transmitted correctly during filtering. From rev. 53303 • #88914: Very high memory consumption in reco of 2011 Pb+Pb. From rev. 53245 • #89237 Code to port in release. From rev. 51957 • #89259 ZDC code porting request. From rev. 53161 • #89270 Request for porting TRD/PWG1 code. From rev. 53069,53082,53137,53154,53155,53169,53171 • #89301 Request to port PWG3/muon filtering update (rev. 53190) • #89324 Request to port r53200 and regenerate ITSRecoParams: Switch to not create tracklets/tracks refs. in PbPb
Changes: v5-01-Rev-15 • #89333 EMCAL: Port track matching modification to improve reconstruction speed to release • #89334 AOD Calorimeters: store distance to matched tracks to clusters in AODs, port to trunk and release. From rev. 53354 • #89335 Request to port AliPHOSRawFitterv4 to release. From rev. 53114,53211,53214 • #89354 commit in STEER and port to Release AliESDTZERO with fixed const. From rev. 53341 • #89355 EMCal Port r53227 & r53230 to the release • #89357 commit in STEER and port to release T0 AOD. From rev. 53358 • #88861 EMCAL: Port Trigger QA analysis task to release
Changes: v5-01-Rev-15 • #88417: Request to commit/port fixes to ANALYSIS/AliFileMerger.{h,cxx}. From rev. 53324 • #88368 Centrality determination updates to be ported to the release. From rev. 52533,52962,53028,53066,53090,53120,53162,53239 • #88827: Request for porting updates to TOF QA task into release. From rev. 53166 • More AddTimeStamp's added for #88914 • #89298: Additional trigger class for semicentral / central -> soon replaces the old ones. From rev. 53071,53199,53241,53243,53367 • #89368: QA final merging crashes REV-14. From rev. 53356
Other reports (28/11/11) • #89170 How to propagate promptly changes in the physics selection/trigger configuration to the QA • #89189 SPD Dead to RAW OCDB • #89233 Centrality selection for all events in the QA train • #89260 Adding sum of 4 tower PMTs vs. common PMT equalization in reconstruction • #89266 Reconstruction timing • #89298 Additional trigger class for semicentral / central -> soon replaces the old ones
Other reports (21/11/11) • #88880 Error: Symbol G__exception in reconstruction • #89002 Update of QA macro • #89012 Implementation of cosmics tracker in standard reconstruction • #89021 AliEve - segmentation fault when executing macros: geom_emcal.C & emcal_all.C • #89071 cpass0 failed / ocdb update failed for 168103, 168076, 168068, 168177
Ongoing investigations • Problems with v5-01-Rev-13 on the GRID • High load on the OCDB servers, problems to access “TPC/Calib/Correction” • Not understood: overcoming the problem using a local copy of the OCDB file • #88914 Very high memory consumption in reco of 2011 Pb+Pb • Event ordering by size: implemented, tested only with local raw file • “Memory pools” implemented by Ruben: under tests in GSI • Additional syswatch points (Ruben) • Slow processing in the cascade finder • Memory trashing in FillESD and in the ESD friends • Working tcmalloc on SLC5 (used by LHCb and ATLAS via LD_PRELOAD)
Changes: OCDB • #89288 TRD: update of Chamber Status for LHC10d • #89302 EMCAL: Port updated bad maps for 2011 to alien
Changes: v5-01-Rev-14 • #87404: Implementing the CDB snapshot. From rev. 51894,51992,52568,52590,52654,52814 • #88417: Request to commit/port fixes to ANALYSIS/AliFileMerger.{h,cxx}. From rev. 53125 • #88861 EMCAL: Port Trigger QA analysis task to release • #88936 Request to port 52924 (MUON DQM) • #88966 pPb configuration. From rev. 52982 • #88980 Request to port trunk rev. 52952 to the Release (fixed warnings) • #88987 request of porting of revision 52961,52965 in the release • #89005 Porting request (update in PWG1/TRD). From rev. 51710,51733,51810,52217,52240,52954,52960,52966
Changes: v5-01-Rev-14 • #89027 Request to port a fix r/53002: Small memory leak in AliQADataMaker • #89031 PHOS trigger in PhysicsSelection. From rev. 53006,53009,53068 • #89058 Port request: change in the trigger names of CVLN. Rev. 53009 in ticket #89031 • #89068 Please port new QA wagon to the Revision. From rev. 52831,52845,52863 • #89113 Port HMPID files to the release. From rev. 52929,52942 • #89120 Request to port optimisation of memory allocation in AliHLTTPCDataCompressionDecoder to v5-01-Release. From rev. 52974,52984,52997,53053
Changes: v5-01-Rev-14 • #89123 port to Release AliT0Reconstructor.cxx with important fix in reconstruction of simulated data. From rev. 53062 • #73877: Interaction time in MC. From rev. 50709,51126 • #88605: Request to port additional VZERO QA analysis task for Pb-Pb run. From rev. 53072 • #89203: Request to port an updated version of the MeanVertexer to the Release branch. From rev. 53117 • Possibility to reconstruct events in decreasing size order
Changes: OCDB • #88991 Request to update TOF OCDB for 2010 pp 900 GeV runs
Problems with v5-01-Rev-13 on the GRID • Very high load on the OCDB servers not seen with v5-01-Rev-12 • Clean restart with Rev-13 did not help • Emergency measure: go back to v5-01-Rev-12 for the GRID production: works, but several important changes are missing • Ongoing investigations (Raffaele, Alina, Latchezar, me): • The tests during the preparation of v5-01-Rev-13 did not show any anomaly, including in the processing of RAW data from AliEn • The differences in the code show no influence on the OCDB access • The log files from Rev-13 show exactly the same OCDB access for each active detector • The log files on the build server are OK • The tar balls from the build server are OK (local test) • Possible old memory corruption that showed up now: run with Valgrind (slow) • Stand alone test with dedicated server planned for tomorrow
#88914 Very high memory consumption in reco of 2011 Pb+Pb • The problem occurred after we moved to high luminosity. The memory goes up to ~3.5 Gb resident, 4.5Gb virtual memory: the jobs are killed • Temporary solution to provide possibility for QA: reconstruct only the first 80 ev. • The memory (resident/virtual) jumps at some high-multiplicity events • Investigations (TPC: Jacek, Marian; HLT: Matthias; ITS: Annalisa; Offline: Ruben, me) • Profiling with Google performance tools • Profiling with massif • Main allocations in TPC, more details on the next slides • Suspected pile-up events are not the only reason • Possibility to reject event based on N_TPC/N_ITS clusters or T0 times: not obvious • Technical solutions • Reconstruction of events in decreasing size order (Andreas) • tcmalloc
Stop saving the non-calibrated ESDs • Remove the non-merged AODs/QA output once the merging is done • Reduce the AliESDfriends to 1% (as in LHC10h) • Note: ESDs (and ancillaries) – 10% of RAW, AODs (and ancillaries) – 1% of RAW
Other production issues • Memory: at the limit, 3Gb RAM, 4Gb virtual. • G__exception that is cured after the resubmission of the failed job: memory corruption
#88626 DQM related problems • #72148 Recuperate thresholds for DQM into SHUTTLE • #84558 Memory leak in ACORDE DQM agent • #84566 Memory leak in HMPID DQM agent • #85143 Memory leak in PHOS DQM agent • #85149 Memory leak in TRI DQM agent • #85151 Memory leak in T0 DQM agent • #85152 Memory leak in TRD DQM agent • #85155 Memory leak in SSD DQM agent • #85175 Memory leak in EMCAL DQM agent
#88626 DQM related problems • #87363 DQM FXS implementation in the Shuttle • #87460 Memory leak in DAQ DQM agent • #88169 SPD Vertex DQM plots absent • #88173 DQM agent T00QAshifter crashes in technical runs • #88175 AMORE GUI unstable in v1.44 • #88210 Crash in the AmoreDA called from the TPC DAs • #88574 Problem with event display • #88576 Port the latest version of aliroot to DQM • #88619 amoreHLT crashes • #88622 amoreQA unstable
#88626 DQM related problems • #88655 Request to port rev 52689 (MTR DQM bug fix) • #88661 custom amore agent SSD01 crash • #88822 AMORE GUI crashes due to VertexXY object produced by SPD DA
Changes: v5-01-Rev-13 • #87623: Request to port a new V0 DA code to the release. From rev. 52803 • #88251 Introduction of the Pb-Pb trigger classes into the phys sel. From rev. 52754,52755 • #88605 Request to port additional VZERO QA analysis task for Pb-Pb run. From rev. 52744,52768, 52801, 52802 • #88698 makeOCDB.C : TRD update adjustment of the validate threshold for the chamber without data. From rev. 52722 • #88763 Porting request (momentum dependent cos(PA) cut for V0). From rev. 52750,52751,52759 • #88779 EMCAL: Fix mem. leak in QAChecker port to release. From rev. 52772,52777,52779 • #88793 Port AddTaskTPCCalib.C to v5-01-Release. From rev. 52780
Changes: v5-01-Rev-13 • #88798: Request to port fixes for: Beam type convention in the GRP. From rev. 52824,52836,52839,52842,52843,52862,52871 • #88807 Fix for AliTriggerPFProtection. From rev. 52778,52786,52797 • #88818 Port changes to AliRoot release - 5.01(AddTaskTPCCalib.C: low flux to high flux ). From rev. 52800 • #88820 new centrality OADB to port in release. From rev. 52781,52784,52804 • #88821 EMCAL: Port setting of clusterizer v2 in reconstruction. From rev. 50740,50741,50748,50749,50750,50753 • #88824 Request to port fix in QADataMaker: r52810 • #88827 Request for porting updates to TOF QA task into release. From rev. 52761,52811
Changes: v5-01-Rev-13 • #88829 Fix in counting processed events. From rev. 52819 • #88837 TPC port to Release request: AliTPCcalibTime.cxx. From rev. 52748,52793,52854 • #88847 Port revisions 52826 & 52827 to the release • #88849 Request to port VZERO event plane implementation. From rev. 51446,51730,52829,52917 • #88852 Request for porting rev. 52798,52821 to the release • #88862 Request: Port update of TRD ExB calibration code to release. From rev. 52822,52847,52857,52875 • #88864 Please port 52850 to release - On-line DQM scaling of histograms
Changes: v5-01-Rev-13 • #88865 Please port 52216,52849 to release - recognise p-A collisions • #88868 ZDC request to port code to the release. From rev. 52565,52687,52818,52852 • #88876 Request for TRD : reduce verbosity in AliTRDclusterizer. From rev. 51395 • #88926: EMCAL: Port QA reference file to release. From rev. 52912
Changes: v5-01-Rev-12 • #88679: Centrality task crashes in the release 5-01-Rev-11. From rev. 52716 • #88674: Porting request for 52710 (HLT event display) • #88686: Porting request for 52714 • #88681: Request to port fix for AliMagF (parser for p-A,A-p beam types): r52711. Created problems in the phys. selection • Technical fix from rev. 52709 (treatment of cosmic reco rarams)
Changes: v5-01-Rev-11 • #88178 Request to port bugfix in the AddTaskPHOSPbPb.C. From rev. 52362,52363 • #88197 Adding TPC cluster map for clusters used in fit. From rev. 52442,52443+52241,52262 • #88206 Request to port 52390 (MUON simulation w/ raw OCDB) • #88228 Request to port fix for proof reco. From rev. 52393 • #88242 Request to port bugfix in the AliAnalysisTaskPHOSPbPbQA.cxx. From rev. 52400,52427 • #88243 Vertex Diamond DA committing and porting request. From rev. 52357,52394,52401 • #88255 Request to port trunk rev. 52402 to the Release (use BPTX clock-shift in TOF calib)
Changes: v5-01-Rev-11 • #88261 Port r52410 to the release (disable EMCAL trigger emulator) • #88293 Request to port 52418 to release: protection against ill-formed QA cloning request • #88325 Request to port rawstream update to v5-01-Release. From rev. 52375 • #88329 port to Release AliT0QADataMakerRec.cxx and AliT0CalibTimeEq.cxx. From rev. 52212,52433,52434 • #88331 Request to port updates allowing the reconstruction of SPD+MUON. From rev. 52425 • #88333 EMCAL: Port L1 QA code for DQM to release. From rev. 52435,52437 • #88334 port to Release code for new T0 reconstruction scheme. From rev. 51643,52436,52479
Changes: v5-01-Rev-11 • #88344 Request to commit and port changes - TPCdedx info. 52445,52481 • #88350 Request to port 52451 to release: new macro to add in-reco analysis train • #88353 Request to port 52453 to release: fix in resetting cloned histos • #88354 Porting request for vertex QA. From rev. 52238,52450 • #88358 Request: Port update of TRD calibration code to release. From rev. 52361,52364,52366,52367,52389,52414,52432,52519 • #88368 Centrality determination updates to be ported to the release. From rev. 51433,52348,52391,52455 • #87900: Request to port changes is AliTOFQADataMakerRec code into release. From rev. 52454,52456
Changes: v5-01-Rev-11 • #88394: Port TPC changes to 5-01-Release - bug fix. From rev. 52500,52505 • #88406: Port TPC changes to 5-01 revisions - AliTPCcalibTimeGain.cxx. From rev. 52511 • #88417: Request to commit/port fixes to ANALYSIS/AliFileMerger. From rev. 52528. • #88455: Request to port commit rev=52408 to the release branch • #88462: TPC request: Update of the AliTPCPreprocesorOffline. From rev. 52517 • #88484 Port changes requested in task #23160. From rev. 52512 • #88477: port to Release AliT0CalibTimeEq, AliT0CalibSeasonTimeShift. From rev. 52543
Changes: v5-01-Rev-11 • #88382: request to port PIDqa related code to the release. From rev. 51070,51614,51739,52215+51790,52209,52213,52230,52317,52349,52369,52384,52407 • #88467: Request to update two VZERO OCDB objects for the forthcoming Pb-Pb run. From rev. 52535 • #88468: Request to commit and include in the new tag the changes in AMPT. From rev. 50767,51229,52570 • #88488: Request for TRD : code for PbPb 2011. From rev. 51823,51834,51836,51902,51949,51960,51969,51975,52163,52330,52340,52552,52553,52555+52554 • #88251: Introduction of the Pb-Pb trigger classes into the phys sel. From rev. 50745,50859,51220,51486,52234,52249,52250,52319,52347,52423,52516,52522
Changes: v5-01-Rev-11 • #88431: Memory corruption related to unpacking of HLT GlobalTriggerDecision during reconstruction. From rev. 52563 • #88468 Request to commit and include in the new tag the changes in AMPT. From rev. 52646 • #88537 Request to commit/port fix in EVE/alice-macros. From rev. 52648 • #88541 Request to port to release Muon QA analysis train from rev. 52059,52575,52576 • #88543 EMCAL: Port better description of resolution to release. Patch from rev. 52592 • #88549 Request to port bugfix in the AliAnalysisTaskPHOSPbPbQA.cxx to the Release. From rev. 52585
Changes: v5-01-Rev-11 • #88600 Request to port CPass0 code in v5-01-Release. From rev. 52467,52514,52515,52561,52564,52626,52634 • #88556 Request to port rev52603 into release (TOF matching code) • #88565 port to Release new T0Physda.cxx and AliT0Reconstructor.cxx. From rev. 52608,52610,52611,52633 • #82052: Request to port r50024 to v4-20-Release: Hidden OCDB call in Muon HLT code • #88169: SPD Vertex DQM plots absent. From rev. 52604,52605
Changes: v5-01-Rev-11 • #88573 Request to port to release TPC code - fix item # 88519. From rev. 52596 • #88583 SDD preprocessor updated on the trunk, commits 52613 and 52614 -> port to the release • #88587 porting request for cPass0 mean vertex code for PbPb processing. From rev. 52627 • #88591 Request to commit/port fix in AliRawReaderChain (setter instead of hardwired search path). From rev. 52667 • #88592 Request to port commit 52628 to the v5-01-Release • #88593 Request to port to release TPC code - fix task #23702. From rev. 52632