Are You Sure What Failures Your Tests Produce?

Are You Sure What Failures Your Tests Produce? Lee White

Results on Testing GUI Systems • CIS (Complete Interaction Sequences) approach for testing GUI systems: Applied to four large commercial GUI systems • Testing GUI system in different environments: operating system, CPU speed, memory • Modified CIS approach applied to regression test two versions of a large commercial GUI system

Three Objectives for this Talk • Use of memory tools during GUI testing discovered many more defects; observability problems here • In GUI systems, defects manifested themselves as different failures (or not at all) in different environments • In GUI systems, many more behaviors reside in the code than designer intended.

Complete Interaction Sequence(CIS) • Identify all responsibilities (GUI activity that produces an observable effect on the surrounding user environment). • CIS: Operations on a sequence of GUI objects that collectively implement a responsibility. • Example: (assume file opened)File_Menu -> Print -> Print_Setup_Selection -> Confirm_Print

FSM for a CIS(Finite State Model) • Design a FSM to model a CIS • Requires experience to create FSM model • To test for all effects in a GUI, all paths within the CIS must be executed • Loops may be repeated, but not consecutively

Figure 1 Edit-Cut-Copy-Paste CIS FSM

How to Test a CIS? • Design Tests: FSM model based upon the design of the CIS is used to generate tests. • Implementation Tests: In the actual GUI, check all CIS object selections, and select all those transitions to another GUI object within the CIS; add these transitions to the FSM model to generate tests, as well as any new inputs or outputs to/from the CIS.

B A C D Figure 2 Design Tests for a Strongly Connected Component [(I1,B,C,D,A,B,C,O1), (I2,A,B,C,D,A,B,C,O1)]

Figure 3 Implementation Tests for a Strongly Connected Component [ (I1,B,C,D,B,C,D,A,B,C,D,A*,B,C,O1), (I1,B,C,D,B,C,D,A,B,C,D,A*,B,C,D,O2), (I2,A,B,C,D,B,C,D,A,B,C,D,A*,B,C,O1), (I2,A,B,C,D,B,C,D,A,B,C,D,A*,B,C,D,O2), (I3,D,A,B,C,D,B,C,D,A*,B,C,O1), (I3,D,A,B,C,D,B,C,D,A*,B,C,D,O2) ]

Table 1 Case Study of 4 Systems

Memory Tools • Memory tools monitor memory changes, CPU changes and register changes • Used to detect failures that would have eluded detection, and account for 34% of faults found in these empirical studies • Used two such tools: Memory Doctor and Win Gauge from Hurricane Systems Tool.

Table 2 Hidden Faults Detected by Memory Tools

Failures of GUI Tests on Different Platforms Lee White and Baowei Fei EECS Department Case Western Reserve University

Environment Effects Studied • Environment Effects: Operating System, CPU Speed, Memory Changes • Same software tested: RealOne Player • 950 implementation tests • For OS, same computer used, but use of Windows 98 and 2000 investigated

Regression Testing GUI Systems A Case Study to Show the Operations of the GUI Firewall for Regression Testing

GUI Features • Feature: A set of closely related CISs with related responsibilities • New Features: Features in a new version not in previous versions • Totally Modified Features: Features that are so drastically changed in a new version that this change cannot be modeled by an incremental change; simple firewall cannot be used.

Software Under Test • Two versions of Real Player (RP) and RealJukeBox (RJB): RP7/RJB1, RP8/RJB2 • 13 features; RP7: 208 obj, 67 CIS, 67 des. tests, 137 impl. tests; RJB1: 117 obj, 30 CIS, 31 des. tests, 79 impl. tests • 16 features; RP8: 246 obj, 80 CIS, 92 des. tests, 176 impl. tests; RJB2: 182 obj, 66 CIS, 127 des. tests, 310 impl. tests.

RP7/RJB1 RP8/RJB2 8 Features 59 Faults 16 Features 21 Faults 17 Faults 5 Totally Modified Features Firewall Tested from Scratch by T2 0 Faults 53 Faults in Original System 3 New Features Tested by T1 Figure 4 Distribution of Faults Obtained by Testers T1 and T2

Failures Identified in Version1, Version2 • We could identify identical failures in Version1 and Version2. • This resulted in 9 failures in Version2, and 7 failures in Version1 not matched. • The challenge here was to show which pair of failures might be due to the same fault.

Different Failures in Versions V1, V2 for the Same Fault • V1: View track in RJB, freezes if album cover included • V2: View track in RJB, loses album cover • Env. Problem: Graphical settings needed from V2 for testing V1

Different Failures (cont) • V1: Add/Remove channels in RP does not work when RJB is also running • V2: Add/Remove channels lose previous items • Env. Problem: Personal browser used in V1, but V2 uses a special RJB browser

Different Failures (cont) • V1: No failure present • V2: In RP, Pressing forward crashes system before playing stream file • Env. Problem: Forward button can only be pressed during play in V1, but in V2, Forward botton can be selected at any time; regression now finds this fault

Conclusions for Issue #1 • The use of memory tools illustrated extensive observability problems in testing GUI systems: • In testing four commercial GUI systems: 34% were missed without use of this tool. • In regression testing, 85% & 90% missed. • Implication: GUI testing can miss defects or surprises (or produce minor failures).

Conclusions for Issue #2 • Defects manifested as different failures (or not at all) in different environments: • Discussed in regression testing study • Also observed in testing case studies, as well as for testing in different HW/SW environments.

Implication for Issue #2 • When testing, you think you understand what failures will occur for certain tests & defects for the same software. But you don’t know what failures (if any) will be seen by the user in another environment.

Conclusions for Issue #3 • Difference between design and implementation tests are due to non-design transitions in actual FSMs for each GUI CIS: • Observed in both case studies • Implication: Faults are commonly associated with these unknown FSM transitions, and are not due to the design.

Question for the Audience • Are these same three effects valid to this extent for software other than just GUI systems? • If so, then why haven’t we seen lots of reports and papers in the software literature reporting this fact?

Are You Sure What Failures Your Tests Produce?