200 likes | 349 Views
Autosave Additions/Upgrades and Experiences at SLAC. Zheqiao Geng Controls Department SLAC National Accelerator Laboratory EPICS Collaboration Meeting Fall 2010 Oct. 12, 2010. Outline. Introduction to A utosave Problems reported at SLAC during operation
E N D
Autosave Additions/Upgrades and Experiences at SLAC Zheqiao Geng Controls Department SLAC National Accelerator Laboratory EPICS Collaboration Meeting Fall 2010 Oct. 12, 2010
Outline Introduction to Autosave Problems reported at SLAC during operation Additions/upgrades of Autosave Conclusion
Autosave • Autosave automatically saves the values of EPICS process variables (PVs) to files on a server, and restores those values when the IOC is rebooted. • - From autosave manual
Save and Restore Methods Supported • Data save methods: • Periodic: save data periodically with defined period • Triggered: save data on the trigger of the CA events of a defined PV • Monitored: check any changes of the PVs at the saving list periodically, and save the data if there is any change • Manual: save data by shell commands • Data restore methods: • Pass 0 restore: restore PV values at database initialization pass 0 • Pass 1 restore: restore PV values at database initialization pass 1 • Manual restore: restore data by shell commands
Threads Concern to Autosave • Run-time data save thread • IOC shell • Callback threads
IOCs Using Autosave at SLAC Soft IOCs running at Redhat Linux Hard IOCs running at RTEMS on MVME6100 Embedded IOCs running at RTEMS on ColdFire uC5282
Reported Problems during Operation P1: Failed to write the .sav file in some cases, like after NFS server reboot P2: Stop attempting to write the .sav file with unknown reasons P3: .sav file is updated, but the new PV data is not written in P4: Bad file descriptors for soft IOC P5: [RTEMS] Failed to flush the saved data into the NFS disk P6: Status string sometimes can not correctly reflect the problems (example: after NFS reboot, the status string still show “Can’t open save file”) P7: The buffer size for saving list auto-generation from the info field of the record is too small
Automatic NFS Remounting for RTEMS • For P1 (Failed to write the .sav file in some cases, like after NFS server reboot) • If there are too many file saving failures (such as failed to open, read or write the file), remount the NFS • The file status is also checked after writing to the disk • Implement operating system dependent codes for NFS mounting for different OS (vxWorks, RTEMS and Linux)
Timeout Checking for Callback Functions • For P2 (Stop attempting to write the .sav file with unknown reasons) • By checking the existing code, we find that the method to activate the data saving for PERIODIC and MONITORED save is potentially risky: • Callback routines are used to activate the saving; Callback is ONLY requested again after the saving is activated so as to introduce some delay • So, if callback does not work even ONCE, the data saving will not be activated, and the callback will never be requested again, and data saving will never happen again • So timeout checking is added for callback function, if timeout, force to activate the data saving to trigger the saving loop
Retrying of CA Connection • In the existing code, PVs are connected only at the start of the program. The functionality for retrying of the unconnected PVs is added. • Include the temporary unreachable PVs into the saving list without rebooting the IOC
Cleaning Up the Status String • For P6 (Status string can not correctly reflect the problems (example: after NFS reboot, the status string still show “Can’t open save file”)) • In the existing code, the status string always shows the most serious failures of different parts of the program • The status string of “Can’t open save file” is generated if there is error during reboot restore with the highest serious level, which will mask the status report of all other parts during run-time • Solution: only keep this status string as an separate reboot_status
Increased Buffer Size for makeAutosaveFileFromDbInfo() • For P7 (Too small buffer size for saving list auto-generation from the info field of the record) • In some soft IOCs, there are tens offields need to be saved for one record, the info field of the record should be large enough to contain all these field names • Increase buffer size from 100 to 2048 bytes
Other Problems • P3 (.sav file is updated, but the new PV data is not written in) P4 (Bad file descriptors) P5 ([RTEMS] Failed to flush the saved data into the NFS disk) • Not appear again until now • Still need concrete examples and investigations to solve them
Conclusion and Outlook The modified version of Autosave has been running with several IOCs at LCLS for ~1 month, and the problems reported before did not appear up to now The additions/upgrades work smoothly More operation experiences are needed to improve and finalize the design Documents for the requirements/architecture of Autosave are also worked out (including reverse engineering from the source codes) We will submit both the modified source package and documents to the initial author (Tim Mooney) and the collaborations for review. Hope to hear about the experiences from other labs and we can make Autosave more robust for various platform together