120 likes | 216 Views
CA integration tests. We need a way to run integration tests test IOCs -> CAJ -> pvmanager Including disconnects due to power cycle and network downtime Corner cases (e.g. different type at reconnect) Ability to check server state (e.g. number of monitors open)
E N D
CA integration tests • We need a way to run integration tests • test IOCs -> CAJ -> pvmanager • Including disconnects due to power cycle and network downtime • Corner cases (e.g. different type at reconnect) • Ability to check server state (e.g. number of monitors open) • Ability to drop in and run the tests in production environment (to check specific versions of EPICS and network configurations) • Start a script on the server side, start a script on the client side, come back in 15 minutes
CA integration tests • Server side: • Requirements: Epics base (softIoc), procserv • Start server script • Starts 1stsoftIoc • Keeps listening on the “command” pv. Possible commands: • start IOCNAME NSEC – stops the current ioc, waits for NSEC, starts the ioc in the IOCNAME directory • netpause NSEC – brings down the network (ifconfig down) for NSEC • connections PVNAME – puts the number of current monitors (casr 2) on the PVNAME in the “output” pv • stop – stops the server side
CA integration tests • Client side: • Library in pvmanager to make integration tests reasonable to write • Two phases • Run a series of tasks while recording all events that come out of pvmanager • Verify the order and number of events coming from pvmanager • If verification fails, you get a table with all the events gathered
public final void run() throws Exception { init("typeChange1"); addReader(PVManager.read(channel("double-to-i32")), TimeDuration.ofHertz(50)); pause(1000); restart("typeChange2"); pause(2000); } public final void verify(Log log) { // Check double log.matchConnections("double-to-i32", true, false, true); log.matchValues("double-to-i32", ALL_EXCEPT_TIME, newVDouble(0.0, newAlarm(AlarmSeverity.INVALID, "UDF_ALARM"), newTime(Timestamp.of(631152000, 0), null, false), displayNone()), newVDouble(0.0, newAlarm(AlarmSeverity.UNDEFINED, "Disconnected"), newTime(Timestamp.of(631152000, 0), null, false), displayNone()), newVInt(0, newAlarm(AlarmSeverity.INVALID, "UDF_ALARM"), newTime(Timestamp.of(631152000, 0), null, false), displayNone())); }
CA integration tests • Covered • Simple reboot: connect pv, ioc down, ioc up, only 1 monitor open • Simple network outage: connect, network down, network up, only 1 monitor open • Multiple reboots: connect pv, ioc cycle 10 times • Type change: connect double pv, ioc cycle, pv become integer • Constant pv: conect to double/int/string/enum that do not change • Slow changing pv: conect to double updating at 1 Hz (same rate received) • Fast changing pv: conect to double updating at 100 Hz (reduced rate received) • Alarm changing pv: conect to double updating at 1 Hz for alarm only • Write pv: change value for double/int • Not yet covered • Add all remaining types for disconnection test • Add all types for type change • Add all types for slow changing pvs • Add all types for fast changing pvs • Add all types for alarm changing pvs • Add all types for write pvs • Add metadata changes • Add access control changes • Add multiple reader on a single pv (only 1 monitor open) • Add nanosec out of range for time • Old RTYP handling
Review BOY connection layer • Review connection layer in BOY to: • Solve concurrency issues • Likely cause of missed events • Investigate performance problems • Background load • Slow to open some screens (>5 sec) • Find better ways to integrate pvmanager
Review BOY connection layer • Findings: • State of widgets accessed/changed from different threads without synchronizations • Simple.pvpvmanager implementation • uses 4 different synchronization methods, not well coordinated, some unneeded • synchronized, volatile, Atomic variable, thread-safe collections • Simple.pv interface forces to split calls to then re-merge them • E.g. connection/value are one callback in pvmanager, split into two, later recombined • Sets the pvmanager rate throttling at 50Hz and then does an additional throttling at 10Hz • Script interface: utility.pv implementation provides all values; pvmanager implementation does not • Different widgets with different needs go through the same code path • E.g. All widgets create a writer, even if they are monitors. Same code for both widgets that need queuing and widgets that need caching
Review BOY connection layer • Changes on special branch: • Connecting BOY directly to pvmanager, skipping utility.pv • Making sure all events go on the UI thread • May solve missed events, but was never tested • Removed unnecessary context switches • Using pvmanager proper event throttling, removing EventBundlingThread • Added pause/resume when widgets out of screen • Script interface too problematic to touch • Hope was to re-implement rules on top of pvmanager • Can’t be done in general as rule user parameters are basically javascript pieces that are concatenated • No formal parsing or rule definition
Review BOY connection layer • Background load • Sources of background load are different on different environment • On my development environment (Windows/Debian/Scientific Linux) the main source of load is SWT. Pause/Resume makes 64% load go to 4% when the window is hidden. • On one BNL production machine, the main source of load seems to be the synchronization used in the thread pool used by pvmanager during the active scanning. Pause/Resume has no significant benefit. • On another BNL production machine, the main load was SWT, but Pause/Resume had no effect. • Not OS dependent. Maybe hardware of hardware + OS combinations. • Slow load • Traced back to use of rules. Each rule is a script. Each script starts a scripting environment. Each scripting environment seems to load a lot of classes (interaction between classloaders and OSGI?). Loading of a screen with a large set of rules is stuck loading/unloading classing for several seconds.
Review BOY connection layer • Takeaway: • Work that needs to be done in BOY • Finish proper pvmanager integration • Properly divide widget state (should all be in the model) so that real-time only updates that • Don’t just have one connection logic for all widget types • Understand how to implement rules on top of pvmanager(re-implement or migrate?) • Whoever does this work will not be able to do the testing himself; needs prompt support and feedback • Performance profile is significantly different • Concurrency issues are difficult to replicate
Review BOY connection layer • Takeaway: • For pvmanager • Wrote 100 times on the blackboard: “My development environment is not a good approximation of all production environment” • Will prepare a performance benchmarking suite to gather data so I can keep track • Passive scanning got on the “toppish” of the list. Considering also implementing a different ExecutorService