AMOD Report

AMOD Report Simone Campana CERN IT-ES

Grid Services • A very good week for sites • No major issues for T1s and T2s • The only one to report is CASTOR@TW • Tail of problems after an hardware failure • DB index corrupted, need rebuild and a scheduled downtime • A typhoon in TW brought complications to the schedule

ATLAS services: DDM SS • On Saturday many SS restarts due to ARDACallback agent crashing • The problem was not related to Dashboard but to activeMQ • Both ActiveMQ and ARDA Dashb callbacks sent by the same agent • Martin B. spotted an issue in an ActiveMQ broker • ActiveMQ callbacks have been disabled in many SS machines • CERN IT has been contacted about the faulty broker

ATLAS Services: DDM SS (follow up) • The case needs to be added to the AMOD documentation (or the DDM documentation) • The AMOD needs to be able to see the ActiveMQ monitoring (now certificate protected) • The AMOD needs to be able to login to Dashboard machines (was possible, not working now) • DDM SS need to be protected against this behavior • Martin has a list of possible improvements

ATLAS Services: DBs • On Sunday afternoon, Online to Offline replication of non DCS data was “yellow” for 2 hours. • This is not ADC responsibility: • The P1 shifter should report to the shift leader • The shift leader should contact the proper people • Something went wrong in this • It is explained in the AMOD twiki but the AMOD missed to see it • The problem vanished by itself

ATLAS Services: schedconfig • There was a “partial” update of schedconfig • Some queue with “copytool=lcgcp2”, “lfcregister=None” in IN2P3-CC • What happens: • The pilot uploads in the SE and does not register in LFC (feature of of lcgcp2) • The panda server does not register in LFC (since lfcregister=None) • Both Panda and Pilot believe all is OK and the job finishes successfully • Now we have dark data and Prodsys thinking the task is complete …

ATLAS Services: schedconfig (follow up) • Ueda is registering missing files by hand • 50% of files produced by IN2P3-CC in one week … • We are lucky Ueda is Ueda … I would take 2 month of holiday. • Schedconfig should protect against this (I am not sure how or if AGIS can protect and how) since: • Human errors happen • The meaning and behavior of schedconfig fields is not well documented • We have many queues, many panda sites and many attributes for each of them • BTW, let’s please push for getting rid of those panda queues once forever (see A. Di Girolamo’s thread)

ATLAS Services: comp@P1 terminal • Firefox in the comp@P1 terminal crashed in the night of Wednesday • The shifter tried the procedure to restart but did not succeed for 1h • Unable to connect to any page • The he called the AMOD. Who could not do much • But the system magically started to work again • The (non confirmed) hypothesis is that the conTZole crashed Firefox • Happened in the past • But this time there was at least another problem • Ueda suggests to run conTZole and all the rest in separate windows

Conclusions • Very quiet shift • My last AMOD was 1 week before the Higgs seminar … • 2 night calls (both of them for a good reason)

AMOD Report

AMOD Report

Presentation Transcript

The Dephi Report Report

AMOD Report December 3-9, 2012

AMOD Report

AMOD Report February 11-17 2013

AMOD Report 5 – 11 Sept

Report

AMOD Report

AMOD Weekly report (Ale, Alexei, Jarka )

Capacity and Capability in Civil Engineering RPF 22 November 2005 Sam Amod Pr Eng

AMOD Report ADC weekly , CERN, 29 November 2011

AMOD report

AMOD report 12-18 Nov 2012

AMOD report

AMOD report

1 st Annual Compliance and AML Seminar Presented by Farouk Amod-Manager Compliance and AML

AMOD Report

Venkateswarlu Gaddam, Gautam Sharma, Neelesh Kumar, Amod Kumar, S.K . Mahna NIT Kurukshetra

AMOD report 6 Feb – 12 Feb 2012

AMOD Report June 24-30, 2013

AMOD report

AMOD report 24 – 30 September 2012

AMOD Report Aug 20-26, 2012