190 likes | 201 Views
Cross-Entropy Based Testing. Hana Chockler, Benny Godlin, Eitan Farchi, Sergey Novikov. Research. Haifa, Israel. The problem: How to test for rare problems in large programs?. Testing involves running the program many times, hoping to find the problem.
E N D
Cross-Entropy BasedTesting Hana Chockler, Benny Godlin, Eitan Farchi, Sergey Novikov Research Haifa, Israel
The problem:How to test for rare problems in large programs? Testing involves running the program many times, hoping to find the problem. • If a problem appears only in a small fraction of the runs, it is unlikely to be found during random executions. searching for a needle in haystack
The main idea: Use the cross-entropy method! The cross-entropy method is a widely used approach to estimating probabilities of rare events (Rubinstein).
The cross-entropy method - motivation • The problem: • There is a probability space S with probability distribution f and a performance function P defined on it. • A rare event e is that P(s) > r, for some s 2 S and some r • How can we estimate the probability of e? and this happens very rarely under f input in which the rare event e occurs Space S s
The naïve idea Generate a big enough sample and compute the probability of the rare event from the inputs in the sample a huge sample from the probability space This won’t work because for very rare events even a very large sample does not reflect the probability correctly
A wishful thinking: if we had a distribution that gives the good inputs the probability 1, then we would be all set … But we don’t have such a distribution w.r.t. the performance function The cross-entropy method • So we try to approximate it in iterations, every time trying to come a little closer: • In each iteration, we generate a sample of some (large) size. • We update the parameters (the probability distribution) so that we get a better sample in the next iteration.
Formal definition of cross-entropy • In information theory, the cross entropy, or the Kullback-Leibler “distance” between two probability distributions p and qmeasures the average number of bits needed to identify an event from a set of possibilities, if a coding scheme is used based on a given probability distribution q, rather than the "true" distribution p. • The cross entropy for two distributions p and q over the same discrete probability space is defined as follows: H(p,q) = - x p(x) log(q(x)) not really a distance, because it is not symmetric
The cross-entropy methodfor optimization problems [Rubinstein] • In optimization problems, we are looking for inputs that maximize the performance function. • The main problem is that this maximum is unknown beforehand. • The stopping point is when the sample has a small relative standard deviation. • The method was successfully applied to a variety of graph optimization problems: • MAX-CUT • Traveling salesman • …
Performance function Updated distribution Illustration starting point Performance function Uniform distribution
The setting in graphs In graph problems, we have the following: The space is all paths in the graph G A performance function f gives each path a value We are looking for a path that maximizes f In each iteration, we choose the best part Q of the sample The probability update formula for an edge e=(v,w) is #paths in Q that use e f’(e) = #paths in Q that go via v
Cross-entropy for testing • A program is viewed as a graph • Each decision point is a node in the graph • Decision points can result from any non-deterministic or other not predetermined decisions: • The performance function is defined according to the bug that we want to find • More on than later … concurrency inputs coin tossing
Our implementation • We focus on concurrent programs. • A program under test is represented as a graph, with nodes being the synchronization points. • Edges are possible transitions between nodes. • The graph is assumed to be DAG – all loops are unwound. • The graph is constructed on-the-fly during the executions. • The initial probability distribution is uniform among edges. • We collect a sample of several hundreds executions. • We adjust the probabilities of edges according to the formula. • We repeat the process until the sample has a very small relative standard deviation. this works only if there is a correct locking policy 1-5%
for i=1 to 100 do sync node; end for i mod 2 sync node odd sync node even Dealing with loops • Unwinding all loops creates a huge graph. • Problems with huge graphs: • Takes more space to represent • Takes more time to converge • We assume that most of the time, we are doing the same thing on subsequent iterations of the loop. for instance, modulo 2 creates two nodes for each location inside the loop – for even and for odd iterations • We introduce modulo parameter. • It reduces the size of the graph. dramatically, but also loses information • There is a balance between a too-small and a too-large modulo parameter that is found empirically.
Bugs and performance functions note that we can also test for patterns, not necessarily bugs
program under test ----------- ----------- ----------- Implementation – in Java for Java Instrumentation Stopper Decider probability distribution table ConCEnter Updater Evaluator disk
Experimental results • We ran ConCEnter on several examples with buffer overflow and with deadlocks. • The bugs were very rare and did not manifest themselves in random testing. • ConCEnter found the bugs successfully. • The method requires significant tuning: the modulo parameter, the smoothing parameter, correct definition of the performance function, etc. Example: A-B-push-pop myName=A // or B – there are two types loop: if (top_of_stack=myName) pop; else push(myName); end loop; thread B thread A thread A thread B x10 36 A the probability of stack overflow is exponentially small B A
Future work • Automatic tuning. • Making ConCEnter plug-and-play for some predefined bugs. • Replay: can we use distance from a predefined execution as a performance function? • Second best: what if there are several areas in the graph where the maximum is reached? • What are the restrictions on the performance function in order for this method to work properly? works already seems that the function should be smooth enough
Related work • Testing: • Random testing • Stress testing • Noise makers • Coverage estimation • Bug-specific heuristics • Genetic algorithms • … • Cross-entropy applications: • Buffer allocation, neural computation, DNA sequence alignment, scheduling, graph problems, … nothing specifically targeted to rare bugs cross-entropy is useful in many areas