70 likes | 250 Views
Commissioning. Post Mortem analysis of commissioning week 31/3/2008 – 4/4/2008. Monitoring Tasks. Currently 9 tasks Rich (U.Kerzel): 1 RichDAQMon Calo (O.Dechamps): 3 CaloDAQCalib, CaloDAQDisplay, CaloDAQMon L0Calo (O.Dechamps): 3 GlobalDAQMon, L0DUDAQMon, L0CaloDAQMon
E N D
Commissioning Post Mortem analysis of commissioning week 31/3/2008 – 4/4/2008 M.Frank CERN/LHCb
Monitoring Tasks • Currently 9 tasks • Rich (U.Kerzel): 1RichDAQMon • Calo (O.Dechamps): 3CaloDAQCalib, CaloDAQDisplay, CaloDAQMon • L0Calo (O.Dechamps): 3GlobalDAQMon, L0DUDAQMon, L0CaloDAQMon • MUON (G.Graziani): 1MuonDAQMon • Online (B.Jost): 1RawSizeONLMon • To come: • L0Muon (J.Cogan): 1L0MuonDAQMon M.Frank CERN/LHCb
General Problems of Monitoring Tasks • No common task execution environment • Setup done with “frozen” cmt setup-script • Nearly every task runs his own scriptAt least each subdetector • Conditions database using SQLite requires “real” temporary disk • No NFS mounts of /tmp or /var/tmp • Requires RAM disk • Tasks still need to be assigned to a specific subdetector • “ECAL” rather than “CALO” • Needs to be sorted out M.Frank CERN/LHCb
Storage • 1 hickup on Monday before start • Maybe due to some debugging during the previous week • Problem that sometimes store02::writerd needs restart • Under investigation • May take some time (occurrence ~1 / week) • If disks disappear, the storage is unhappy • Requires complete restart (including by hand action on store02) • Behaved pretty well throughout the entire week • Including dynamic partitioning(RICH1, RICH2, HCAL, TRG, MUON) M.Frank CERN/LHCb
RunInfo Datapoint • After every upgrade of the run info datapoint specific configurations disappear • RunInfo DP has its own instance (and definition) in each partition(different PVSS system) • Monitoring tasks • Storage configuration M.Frank CERN/LHCb
Booting of Nodes • No verification procedure that a node has booted properly • Each reboot needs “by hand” re-configuration • Boot startup tasks start properly only on controls PCs • PVSS projects are started • FSM does not always start properlySub-trees sometimes stay dead • On Farm/Monitoring/Storage nodes the boot startup does not always work • Require Controls PC to be up and running (task manager) • Needs investigation • Tasks seem to be restarted regularly • FMC task manager inconsistencies on Controls PC • During boot all tasks are started properly • Starting tasks later fails / makes tmSrv hang M.Frank CERN/LHCb
General Observations for Farm/Monitoring/Storage Operation • We are flying completely blind folded • If things work all is fine • If they don’t it is difficult to find out why • There are no tools, which with a few panels/windows give coherent diagnostics / reports of what is going wrong • Yes, there is the logViewer – many messages, one per subfarm • mbm utilities to monitor subfarms, monitoring, storage, nodes • Still, for expert use only • Nothing, which allows in depth and simple investigationif anything fails M.Frank CERN/LHCb