70 likes | 78 Views
This is the schedule and goals for the ATONR database tests, including observations and certifications of various tests performed. Results and observations are recorded for future reference and improvements.
E N D
ATONR Database Tests – 17th Feb. Schedule and Goals & Results Florbela Viegas, CERN ADP
Total ATONR unavailability: DCS test: 2 hour blackout, observe disconnection and reconnectionbehaviour. Certifythatpreviousobservedproblems are solved. Muon tests : Severalconnect and disconnect cycles. Correct and certifythatproblemsobserved in reconnection are solved. ATONR Standby DatabaseFailover Certifythat all programs connectcorrectly and transparently to the standby Standby iswrite-enabled. Certifythat no gaps in information are observed: Rainer Bartoldus willwatch and record HLT DCS willwatch and certify PVSS archive (no gaps in archive). GPN disconnection test by TDAQ admins , to be made after certifications above are done. Behaviour of all programs shouldbeobserved. Tests and Goals
08h00 – 10h15 ATONR isshutdown. No DB service. 10h15 – ATONR is back up (DCS 2 hour blackout test done). 10h15– 12h00 – Muon tests willbe made. IT DBAswillbring up and down ATONR several times as needed. 12h00 – 12h30 - Full ATONR shutdown. 12h30 – 14h00 - Online Standby willbeavailable. 14h00-14h30 – Online Standby willbeshutdown. No DB service 14h30 – ATONR willbeavailableagain. End of DB tests. 15h00 – GPN tests maystart. Assumptions: Shutdown of ATONR at 8h00 does not interferewith Muon tests. All state changes willbelogged in P1 ELOG Proposed Schedule
1. On the DAQ side with a run ongoing, we saw that stoplessrecovery actions for TGC RODs failedwhile the DB was disconnected, which is not understood since to our expert's statement no parameters are loaded from the DB during this procedure. A After the DB was back available, things worked back ok. No such behaviour seen for the other muonsubdets. 2. MDT Jtag initialization: Failed while the DB was disconnected as expected, since we retrieve parameters from there. We were however glad that the init procedure, this is a custom build DLL using C++ to access the DB we then call from PVSS, did NOT hang but handled correctly the DB unavailability. 3. PVSS CtrlRDBAccess extension, where the main emphasis was on, which we use extensicvely for the MDT and TGC custom Oracle DB write: --> MDT: Processes reconnected to the DB ok after the DB was back up, writing data resumed without need for a manual intervention in all cases (alignment, bfield, temperature data etc.). Full success. --> TGC: Unfortunately the situation is not as good, here numerous PVSS processes were left with invalid DB handles which then prevented to reconnect to the DB once it was available again. This has the consequence that any writing of data, eg for TGC HV conditions, to Oracle TGC DB failed until control scripts in PVSS were manually restarted. And, this was not detected by the current TGC DCS alarms. Here is clearly work. --> command timeout: In the past, we observed that when we lost connection to the DB PVSS control scripts got into a blocking state for the time the database was unavailable; test of a new version of the CtrlRDBAccess extension we use for all PVSS write to Oracle custom tables was succesful, we validated a command timeout of 45 secs on DB insertion to work as expected. This new feature has in the meantime been implemented in all MDT cases incl. the alignment. --> problems with a DB disconnect leading to control script crashes: this we had observed in the past, the issue has disappeared with their latest current versions of CtrlRDBAccess we are using. In total it was a very useful exercise Info by Stephanie Zimmerman Muon Observations