1 / 48

Automatic Inference of Code Transforms

Explore automated methods for inferring code transforms improving patch generation systems, reducing human effort in software defect fixing. Developed system Genesis to abstract away patch-specific details and create candidate patch search spaces efficiently.

madam
Download Presentation

Automatic Inference of Code Transforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Inference of Code Transforms for Patch GenerationFan Long, Peter Amidon and Martin Rinard ACM ESEC/FSE 2017 • Software Engineering Laboratory • Dept. of Computer Science • G201792004 • YoungjunJeong

  2. Contents • 1 Introduction • 2 Transform Inference • 3 Inferred Transforms • 4 Inference System • 5 Implementation • 6 Experimental Results • 7 Conclusion

  3. 1 Introduction • Automatic patch generation [30, 33-35, 37, 38, 40, 48, 56, 61, 62] hold out the promise of significantly reducing the human effort required to diagnose, debug, and fix software defects • The standard generate and validate approach starts with a set of test cases, at least one of which exposes the defect • All previous generate and validate systems work with a set of manually crafted transforms [33-35, 37, 38, 48, 56, 61, 62] to patch bugs that fall within the scope of transforms

  4. 1.1 Genesis • A novel systems that infers code transforms for automatic patch generation systems [36] • Genesis generalizes subsets of patches to infer transforms that together generate a productive search space of candidate patches • To the best of their knowledge, Genesis is the first system to automatically infer patch generation transforms or candidate patch search spaces from successful patches

  5. 1.1 Genesis • Transforms: Each Genesis has two template AST – original/replacement code • Generators: introduce new code and logic, and is essential to enabling Genesis to generate correct patches for previously unseen applications • Search Space Inference with ILP(Integer Linear Program) • A key challenge in patch search space design is navigating an inherent tradeoff between coverage and tractability [39]

  6. 1.2 Experimental Results • Working with a training set that includes 483 NP(Null Pointer) patches, 199 OOB(Out of Bounds) patches, and 287 CC(Class Cast) patches drawn from 356 open source applications • Genesis infers a search space generated by 108 transforms • They compare Genesis with PAR [33, 44], a previous patch generation system for Java that works with manually defined patch templates

  7. 1.3 Contributions • Transforms with Template ASTs and Generators • These transforms enable Genesis to abstract away patch-and application-specific details to capture common patch patterns and strategies • Generators enable Genesis to synthesize the new code and logic • Patch Generalization • Genesis automatically derives a transform that captures the common patch generation pattern present in the patches

  8. 1.3 Contributions • Search Space Inference • Starts with a set of training patches • For tradeoff between coverage and tractability • Complete System and Experimental Results

  9. 2 Transform Inference (1) • Patch Sampling and Generalization • Genesis inference algorithm works with sampled subsets of patches from the training set • Applies a generalization algorithm to infer a transform that it can apply to generate candidate patches

  10. 2 Transform Inference (2) - Example • Contents

  11. 2 Transform Inference (2) • Template Anatomy • Each transforms has a template • The transform has an initial template AST • The transform also has a replacement template AST • In example, the template is • , , and are unmatched template variables

  12. 2 Transform Inference (3) • Generator Constraints • Control the components that the generator will enumerate • and specify sets of operators to enumerate • : must be expression • : can contain only method calls or variable references • : can contain at most 2 AST nodes

  13. 2 Transform Inference (4) • Candidate Transforms • Genesis repeatedly samples training patches to obtain the candidate transforms

  14. 2 Transform Inference (5) • Search Space Inference • Genesis must discard overly general transforms such as and include complementary and effectively targeted transforms such as , , and

  15. 2 Transform Inference (6) • Patch Generation • Genesis first uses a defect localization technique to produce a ranked list of potential statements to modify • Genesis applies all selected transforms, including , to the ‘if’ condition to generate candidate patches

  16. 3 Inferred Transforms (1) • Transforms That Target Boolean Expressions • Conjoin or disjoin a generated subexpression to a boolean condition in the original program • Conditional Execution • Conditionally execute existing matched code

  17. 3 Inferred Transforms (2) • Inserted If Then Else • Wrap existing code in an ‘if-then-else’ statement • Inserted If Then

  18. 3 Inferred Transforms (3) • Replace Code • Replace existing code with newly generated code • The transforms differ in the form of the code they replace and generate and the generator constraints

  19. 3 Inferred Transforms (4) • Try/Catch/Continue • Wraps existing code in a try construct with an empty catch block • For Loop Off By One • Corrects off by one errors in for loops, specifically by enumerating combinations of starting values and loop termination conditions

  20. 3 Inferred Transforms (5) • Change Declared Type • Changes the declared type of a variable declaration (including initializer) • Other Transforms • E.g. null check insertion

  21. 3 Inferred Transforms - Discussion • Genesis transforms are more numerous, more diverse, and target a wider range of defects more precisely and tractability • Sometransformstargetspecificdefectclasses such as off by one defects in for loops • Other transforms apply general templates with the generator constraints controlling the enumeration to deliver a tractable search space

  22. 4 Inference System • Given a set of training pairs , each of which corresponds to a program before a change and a program after a change, Genesis infers a set of transform that, working together, generate the search space of patches • They model the programming language that Genesis works with as a context free grammar (CFG) with AST as the parse trees for the CFG

  23. 4.1 Definition 4.1 - CFG • : CFG(Context Free Grammar) • : the set of non-terminals • : the set of terminals • : a set of productions rules of the form • , • : the starting non-terminal of the grammar

  24. 4.1 Definition 4.2 - AST • : An Abstract Syntax Tree • : a finite set of nodes in the tree • : root node of the tree • : maps each node to the list of its children nodes • : attaches a non-terminal or terminal label to each node in the tree

  25. 4.1 Definition 4.3, 4.4, 4.5 – about AST • : the terminal string obtained via traversing • : valid AST of G if and only if • : AST forest • : the list of root nodes of trees in the forest • : AST slice • : an AST • : a list of AST sibling nodes in

  26. 4.1 Notation and Utility Functions • For a map , dom() denotes the domain of • nodes(, ): the set of nodes in a forest • : the list of the root nodes of the trees in the forest • inside(): the set of non-terminals of the ancestor nodes of • nonterm(): the set of non-terminals inside a forest • diff(, ): the number of different terminals in leaf nodes between two ASTs, AST slices, or AST forests

  27. 4.2.1 Template AST Forest • The key difference between template and concrete AST forests is that template AST forests contain template variables • : template AST forest • : a finite set of template variables • : : a map that assigns each template variable to a bit of zero or one and a set of non-terminals

  28. 4.2.1 Template AST Forest • For each template variable : the kind of AST subtrees or sub-forests which the variable can match against • : 0 or 1 (0: can match against only AST subtrees, 1: can match against both subtrees and sub-forest) • can match against as AST subtree or sub-forest only if its roots have non-terminal in

  29. 4.2.1 Template AST Forest • : matches the concrete • : a map that assigns each template variable in • : matches the concrete AST slice with the variable bindings specified in

  30. 4.2.1 Template AST Forest • The first rule corresponds to the simple case of a single terminal node • The second and the third rules correspond to the cases of a single non-terminal node or a list of nodes

  31. 4.2.1 Template AST Forest • The fourth and fifth rules correspond to the case of a single template variable node in the template AST forest • The fourth rule matches the template variable against a forest • The fifth rule matches the variable against a tree

  32. 4.2.2 Generators • Generators enumerate new code components • : Generator • : integer bound for the number of tree nodes • : the set of allowed non-terminals during generation

  33. 4.2.2 Generators • Generators exhibit two kinds of behaviors • : generates a sub-forest with less than nodes than contains only non-terminals inside the set • : copies an existing sub-forest from the original AST tree with non-terminal labels in and then replaces up to leaf nodes in the copied sub-forest • : generation operator • : the generator generates the AST forest

  34. 4.2.2 Generators • The first rule (): checks that the number of nodes in the result forest is within the bound and the set of non-terminals in the forest is a subset of • The second rule (): checks that the difference result forest and an existing forest in the original AST is within the bound and the root labels are in

  35. 4.2.3 Transforms • : a transform • : a set of non-terminals that denote the context where this transform can apply • : the template AST forest before/after applying the transform • : maps each template variable

  36. 4.2.3 Transforms • : applying to the AST slice generates the new AST • : applying to the AST slice generate the AST of the slice • In Figure 5, and determine the context where the transform can apply

  37. 4.3 Transform Generalization • There are many possible transforms • How to select useful transforms? • Evaluation coverage and tractability • Coverage: Does the transform generates the correct patch? • Tractability: How many candidate patches the transform generates in total?

  38. 4.3 Transform Generalization • is the formula for a generator that generates from scratch • produces generator by computing • the bound of the minimum diff distance • the set of non-terminals of the root node labels

  39. 4.3 Transform Generalization • computes template AST before/after change • Computes B by invoking to obtain the generalized generators for AST sub-slices that match against each free template AST forest

  40. 4.4 Sampling Algorithm • can be invoked any subset of to obtain a different set of transforms -> An exponential number of transforms could be obtained • The goal of sampling: to use generalization function to systematically obtain a set of productive candidate transforms

  41. 4.4 Sampling Algorithm • : training set • : validation set • : work set that contains the candidate subset of • Computes fitness score at each iteration -> keep the top subsets

  42. 4.5 Search Space Inference Algorithm • ILP(Integer Linear Program) Formulation: Given a set of candidate transform , the goal is to select a subset from that successfully navigates the patch search space coverage vs. tractability tradeoff

  43. 5 Implementation • They use the spoon library [46] to parse Java programs • Their current implementation supports any Java application that operates with the maven project management system [5] and Junit [19] testing framework • Genesis applies its inferred transforms to each of the suspicious statements in the ranked defect localization list • For each transform, Genesis computes a cost score which is the average number of candidate patches the transform needs to generate to cover a validation case

  44. 6 Experimental Results

  45. 6 Experimental Results

  46. 6 Experimental Results

  47. 6 Experimental Results

  48. 7 Conclusion • Previous generate and validate patch generation systems work with a fixed set of transforms defined by their human developers • By automatically inferring transforms from successful human patches, Genesis makes it possible to leverage the combined expertise and patch generation strategies of developers worldwide to automatically patch bugs in new applications

More Related