1 / 21

CBCD : Cloned Buggy Code Detector

CBCD : Cloned Buggy Code Detector. Michael D. Ernst University of Washington Seattle, WA, USA mernst@uw.edu. Jingyue Li DNV Research & Innovation Høvik , Norway Jingyue.Li@dnv.com. B ackgrounds.

licia
Download Presentation

CBCD : Cloned Buggy Code Detector

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CBCD:Cloned Buggy Code Detector Michael D. Ernst University of Washington Seattle, WA, USA mernst@uw.edu Jingyue Li DNV Research & Innovation Høvik, Norway Jingyue.Li@dnv.com

  2. Backgrounds • Code copy-paste and software reuse makes buggy code appear in multiple places in a system or in different systems

  3. Two Contributions • First, present an empirical study of cloned buggy code. To find identical buddy code. examined the data in the SCM (Software Configuration Management System) of 4 projects • Second, present a tool, CBCD to search for cloned buggy code, uses isomorphism matching in the Program Dependence Graph (PDG) to search for identical code

  4. EMPIRICAL STUDY OF CLONED BUGGY CODE manually investigated whether buggy lines of code are cloned in real systems. examined the SCM of the Linux kernel, Git, and PostgreSQL, and the bug reporting system of a commercial software product line.

  5. the Linux kernel (1) • searched for the keywordsin commit messages and in the bug tracking system, records discussions between developers during debugging. For each match, we read : the description of the commit, the discussions between developers, the “diff” of the original file and the changed file

  6. the Linux kernel (2)

  7. Git and PostgreSQL

  8. Commercial Software Product Line

  9. CBCD——Cloned Buggy Code Detector • adapt Program Dependence Graph (PDG)-based code clone detection methods • pipe-and-filter architecture • algorithm :consists of three steps

  10. Three steps of CBCD algorithm • Step 1:CodeSurfer generates the PDG of both the buggy code (the “Bug PDG”) and of the system to be searched for clones of the buggy code (the “System PDG”). • Step 2: CBCD prunes and splits the System PDG to reduce its complexity and make subgraph checking cheaper. • Step 3: CBCD determines whether the Bug PDG is a subgraph of the System PDG. It uses igraph’s implementation of subgraph isomorphism matching.

  11. Architecture of CBCD

  12. step 2 of CBCD • subgraph isomorphism identification is NP-complete • O(N!N) (N: the sum of the number of nodes and edges of both graphs to be compared.) • scalability problemfour optimizations

  13. Four Optimizations——Opt1 • Exclude Irrelevant Edges and Nodes from the System PDG remove every edge that cannot match an edge in the Bug PDG, because such an edge is irrelevant for CBCD’s purposes. I

  14. Four Optimizations——Opt2 • Break the System PDG into Small Graphs •  Opt2-step1: Count the number of nodes of each vertex kind in the Bug PDG and the System PDG. •  Opt2-step2: Choose the vertex kind in the Bug PDG that has the minimum number of occurrences in the System PDG. If it occurs 0 times in the System PDG, there is no graph match. •  Opt2-step3: Calculate the pseudo-radius of the Bug PDG: the greatest distance between a node of vertex kind and any other node. •  Opt2-step4: For each node of vertex kind in the System PDG, find the neighbor graph of the vertex, with radius from the node of kind .

  15. Four Optimizations——Opt3 • Exclude Irrelevant PDGs each node of the Bug PDG must correspond to some node of a System PDG component, so each System PDG component must have as many, or more, nodes of each vertex kind than the Bug PDG does.

  16. Four Optimizations——Opt4 • break Up Large Bug Code Segments only triggered when the bug has more than 8 lines of contiguous code.

  17. Evaluation • use 5 Git bugs, 14 PostgreSQL bugs, and 34 Linux bugs. • To compare CBCD with other types of code clone detectors, we also ran Simianv2.3.32 (text-based),CCFinderv10.2.7.3 (token-based), Deckard v1.2.1 (AST-based), and CloneDR v2.2.5 (AST-based) on these 53 bugs.

  18. Evaluation • false negative: a clone identified by the developer but not identified by the tool • false positive: a clone reported by a tool that the developers did not report as buggy. • N1: no false positives, no false negatives. N2: no false positives, some false negatives. N3: some false positives, no false negatives. N4: some false positives, some false negatives.

  19. Results

  20. Q&A~~

More Related