Skoll: A System for Distributed Continuous Quality Assurance

Skoll: A System for Distributed Continuous Quality Assurance Atif Memon & Adam Porter University of Maryland {atif,aporter}@cs.umd.edu

Quality Assurance for Large-Scale Systems • Modern systems increasingly complex • Run on numerous platform, compiler & library combinations • Have 10’s, 100’s, even 1000’s of configuration options • Are evolved incrementally by geographically-distributed teams • Run atop of other frequently changing systems • Have multi-faceted quality objectives • How do you QA systems like this? http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Distributed Continuous Quality Assurance • QA processes conducted around-the-world, around-the-clock on powerful, virtual computing grids • Grids can by made up of end-user machines, project-wide resources or dedicated computing clusters • General Approach • Divide QA processes into numerous tasks • Intelligently distribute tasks to clients who then execute them • Merge and analyze incremental results to efficiently complete desired QA process • Expected benefits • Massive parallelization allows more, better & faster QA • Improved access to resources/environs. not readily found in-house • Carefully coordinated QA efforts enables more sophisticated analyses http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Group Collaborators Doug Schmidt & Andy Gohkale Alex Orso Myra Cohen Murali Haran, Alan Karr, Mike Last, & Ashish Sanil Sandro Fouché, Alan Sussman, Cemal Yilmaz (now at IBM TJ Watson) & Il-Chul Yoon http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Skoll DCQA Infrastructure & Approach Clients See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525. http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Skoll DCQA Infrastructure & Approach Clients 1. Model See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525. http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Skoll DCQA Infrastructure & Approach Clients 2. Reduce Model See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525. http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Test Request Test Resources Skoll DCQA Infrastructure & Approach Clients 3. Distribution See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525. http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Test Results Skoll DCQA Infrastructure & Approach Clients 4. Feedback See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525. http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Skoll DCQA Infrastructure & Approach Clients 5. Steering See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525. http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

The ACE+TAO+CIAO (ATC) System • ATC characteristics • 2M+ line open-source CORBA implementation • maintained by 40+, geographically-distributed developers • 20,000+ users worldwide • Product Line Architecture with 500+ configuration options • runs on dozens of OS and compiler combinations • Continuously evolving – 200+ CVS commits per week • Quality concerns include correctness, QoS, footprint, compilation time & more http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Define QA Space http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Nearest Neighbor Search http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

CORBA_MESSAGING | 1 0 AMI_POLLER AMI 1 0 1 0 AMI_CALLBACK OK ERR-1 ERR-3 1 0 OK ERR-2 Fault Characterization • We used machine learning techniques (classification trees) to model option & setting patterns that predict test failures http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Applications & Feasibility Studies • Compatibility testing of component-based systems • Configuration-level fault characterization • Test case generation & input space exploration http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Compatibility Testing of Comp-Based Systems Goal • Given a component-based system, identify components & their specific versions that fail to build Solution Approach • Sample the configuration space, efficiently test this sample & identify subspaces in which compilation & installation fails • Initial focus on building & installing components. Later work will add functional and performance testing See: I. Yoon, A. Sussman, A. Memon & A. Porter, Direct-Dependency-based Software Compatibility Testing. International Conference on Automated Software Engineering, Nov. 2007 (to appear). http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

The InterComm (IC) Framework • Middleware for coupling large scientific simulations • Built from up to 14 other components (e.g., PVM, MPI, GCC, OS) • Each comp can have several actively maintained versions • There are complex constraints between components, e.g., • Requires GCC version 2.96 or later • When configured with multiple GNU compilers, all must have the same version number • When configured with multiple comps that use MPI, all must use the same implementation & version • http://www.cs.umd.edu/projects/hpsl/chaos/ResearchAreas/ic • Developers need help to • Identify working/broken configurations • Broaden working set (to increase potential user base) • Rationally manage support activities http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Annotated Comp. Dependency Graph • ACDG = (CDG, Ann) • CDG: DAG capturing inter-comp deps • Ann: comp. versions & constraints • Constraints for each cfg, e.g., • ver (gf) = x ver (gcr) = x • ver (gf) = 4.1.1  ver (gmp) ≥ 4.0 • Can generate cfgs from ACDG • 3552 total cfgs. Takes up to ~10,700 CPU hrs to build all http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

fc 4.0 gcr v4.0.3 gmp v4.2.1 pf/6.2 Improving Test Execution • Cfgs often share common build subsequences. This build effort should be reusable across cfgs • Combine all cfgs into a data structure called a prefix tree • Execute implied test plan across grid by (1) assigning subpaths to clients, (2) building each subcfg in a VM & caching the VMs to enable reuse • Example: with 8 machines each able to cache up to 8 VMs exhaustive testing takes up to 355 hours http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Direct-Dependency (DD) Coverage • Hypothesis: A comp’s build process is most likely to be affected by the comps on which it directly depends • A directly depends on B iff there is a path (in CDG) from A to B containing no comp nodes • Sampling approach • Identify all DDs between every pair of components • Identify all valid instantiations of these DDs (ver. combs that violate no constraints) • Select a (small) set of cfgs that cover all valid instantiations of the DDs http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Executing the DD Coverage Test Suite • DD test suite much smaller than exhaustive • 211 cfgs with 649 comps vs 3552 cfgs with 9919 comps • For IC, no loss of test eff. (same build failures exposed) • Speedups achieved using 8 machines w/ 8 VM cache • Actual case: 2.54 (18 vs 43 hrs) • Best case: 14.69 (52 vs 355 hrs) http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Summary • Infrastructure in place & working • Complete client/server implementation using VMware • Simulator for large scale tests on limited resources • Initial results promising, but lots of work remains • Ongoing activities • Alternative algorithms & test execution policies • More theoretical study of sampling & test exec approaches • Apply to more software systems http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Configuration-Level Fault Characterization Goal • Help developers localize configuration-related faults Current Solution Approach • Use covering arrays to sample the cfg space to test for subspaces in which (1) compilation fails or (2) reg. tests fail • Build models that characterize the configuration options and specific settings that define the failing subspace See: C. Yilmaz, M. Cohen, A. Porter, Covering Arrays for Efficient Fault Characterization in Complex Configuration Spaces, ISSTA’04, TSE v32 (1) http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Covering Arrays • Compute test schedule from t-way covering arrays • a set of configurations in which all ordered t-tuples of option settings appear at least once • 2-way covering array example: http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Limitations • Must choose the strength covering array before computing it • No way to know, a priori, what the right value is • Our experience suggests failures patterns can change over time • Choose too high: • Run more tests than necessary • Testing might not finish before next release • Non-uniform sample negatively affects classification performance • Choose too low: • Non-uniform sample negatively affects classification techniques • Must repeat process at higher strength http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Incremental Covering Arrays • Start with traditional covering array(s) of low strength (usually 2) • Execute test schedule & classify observed failures • If resources allow or classification performance requires • Increment strength • Build new covering array using previously run array(s) as seeds See: S. Fouche, M. Cohen and A. Porter. Towards Incremental Adaptive Covering Arrays, ESEC/FSE 2007, (to appear) http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Incremental Covering Arrays (cont.) Multiple CAs at each level of t • Use t1 as a seed for the first t+1-way array (t+11) • To create the itht-way array (ti), create a seed of size = |t-11| using non-seeded cfgs from t-1i • If |seed| < |t-11| complete seed with cfgs from t+11 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

MySQL Case Study • Project background • Widely-used, 2M+ line open-source database project • Cont. evolving & maint. by geographically-distributed developers • Dozens of cfg opts & runs on dozens of OS/compiler combos • Case study using release 5.0.24 • Used 13 cfg opts with 2-12 settings each (> 110k unique cfgs) • 460 tests/per config across a grid of 50 machines • Executed ~50M tests total using ~25 CPU years http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Results • Built 3 trad and incr covering arrays for 2  t  4 • Traditional sizes : 108, 324, 870 • Incremental sizes: 113, 336 (223), 932 (596) • Incr appr exposed & classified the same failures as trad appr • Costs depend on t & failures patterns • Failures at level t : Inc > Trad (4-9%) • Failures at level < t : Inc < Trad (65-87%) • Failures at level > t : Inc < Trad (28-38%) http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Summary • New application driving infrastructure improvements • Initial results encouraging • Applied process to a configuration space with over 110K cfgs • Found many test failures corresponding to real bugs • Incremental approach more flexible than traditional approach. Appears to offer substantial savings in best case, while incurring minimal cost in worst case • Ongoing extensions • MySQL continuous build process • Community involvement starting • Want to volunteer? Go to http://www.cs.umd.edu http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

JFCUnit Other interactions Exponential with length Capture/replay Tedious Test “common” sequences Bad Idea Model-based techniques GUITAR guitar.cs.umd.edu GUI Test Case – Executable By A “Robot” http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Sampling Follows Follows Follows Follows Modeling The Event-Interaction Space • Event flow graph (EFG) • Nodes: all GUI events • Starting events • Edges: Follows • relationship • Reverse Engineering • Obtained Automatically • Test case generation • Cover all edges See: Atif M. Memon and Qing Xie, Studying the Fault-Detection Effectiveness of GUI Test Cases for Rapidly Evolving Software. IEEE Transactions on Software Engineering, vol. 31, no. 10, 2005, pp. 884-896. http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Point to the CVS head Push the button Read error report What happens Gets code from CVS head Builds Reverse engineers the event-flow graph Generates test cases to cover all the edges 2-way covering Runs them SourceForge.net Four applications Lets See How It Works! http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Intuition Non-interacting events (e.g., Save, Find) Interacting events (e.g., Copy, Paste) Key Idea Identify interacting events Mark the EFG edges (Annotated graph) Generate 3-way, 4-way, … covering test cases for interacting events only Digging Deeper! EFG http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Identifying Interacting Events • High-level overview of approach • Observe how events execute on the GUI • Events interact if they influence one another’s execution • Execute event e2; execute event sequence <e1, e2> • Did e1 influence e2’s execution? • If YES, then they must be tested further; annotate the <e1, e2> edge in graph • Use feedback • Generate seed suite • 2-way covering test cases • Run test cases • Need to obtain sets of GUI states • Collect GUI run-time states as feedback • Analyze feedback and obtain interacting event sets • Generate new test cases • 3-way, 4-way, … covering test cases http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Did We Do Better? • Compare feedback-based approach to 2-way coverage http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Summary • Manually developed test cases • JFCUnit, Capture/replay • Can be deployed and executed by a “robot” • Too many interactions to test • Exponential • The GUITAR Approach • Develop a model of all possible interactions • Use abstraction techniques to “sample” the model • Develop adequacy criteria • Generate an “initial test suite”; Develop an “Execute tests – collect feedback – annotate model – generate tests” cycle • Feasibility study & results http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

Future Work • Need volunteers for MySQL Build Farm Project • http://skoll.cs.umd.edu • Looking for more example systems (help!) • Continue improving Skoll system • New problem classes • Performance and robustness optimization • Improved use of test data • Test case ROI analysis • Configuration advice • Cost-aware testing (e.g., minimize power, network, disk) • Use source code analysis to further reduce state spaces • Extend test generation technology outside GUI applications • QA for distributed systems http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/}

The End

Skoll: A System for Distributed Continuous Quality Assurance