340 likes | 486 Views
Scalable Clone Detection and Elimination for Erlang Programs. Huiqing Li, Simon Thompson University of Kent Canterbury, UK. Overview. Erlang Wrangler Clone detection Clone elimination Case studies Conclusions and future work. Erlang. Weakly typed functional programming language.
E N D
Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK
Overview • Erlang • Wrangler • Clone detection • Clone elimination • Case studies • Conclusions and future work
Erlang • Weakly typed functional programming language. • Built-in support for concurrency, distribution and fault-tolerance. • Some eccentricities: multiple binding occurrences, bound variables in patterns, multiple usages of atoms, side-effects, .... %% Factorial in Erlang. -module (fac). -export ([fac/1]). fac(0) -> 1; fac(N) when N > 0 -> N * fac(N-1).
Wrangler Clone detection+ removal Improve modulestructure Basic refactorings: structural, macro, process and test-framework related
Clone Detection • The Wrangler clone detector • Report clone classes whose members are identical or similar • No false positives • High recall rate • Scalable.
What is ‘identical’ code? X+4 Y+5 X+4 Y+5 variable+number Identical if values of literals and variables ignored, but respecting binding structure.
What is ‘similar’ code? (X+3)+4 4+(5-(3*X)) X+Y The anti-unification gives the (most specific) common generalisation. Similarity = min( , , ) ||X+Y|| ||X+Y|| ||(X+3)+4|| ||4+(5-(3*X))||
Clone Detection • All clones in a project meeting the threshold parameters. • Thresholds: • minimum number of expressions, • minimum number of tokens, • minimum number of duplications, • maximum number of new parameters, and • minimum similarity score.
Implementation • Clone detection in an incremental way. • Initial clone detection. • Incremental clone detection. • AST-based two-phase clone detection.
The Initial Detection Algorithm Source Erlang programs • Bypasses the Erlang pre-processor; • Location information included In AST; • Static semantic information added to AST • AAST traversed, and expression sequences collected. Parse program, annotate and serialise AST • Capture structural similarity between expressions while keeping a structural skeleton of the original; • Replace certain substrees with a placeholder, but only if sensible to do so. • Each expression statement is hashed and mapped to an integer; therefore each expression sequence is mapped to a sequence of integers. Serialised AAST Generalise and hash expression Hashed expression sequences • Check a candidate clone class for anti-unification, and will return none, one or more clone classes; • Generation of anti_unifier function; • Generation of application instances. Clone detection using generalised suffix tree Initial clone candidates Examination of clone candidates using anti-unification Final clones
The Initial Detection Algorithm • Designed with incremental clone detection in mind. • Use relative locations, every function starts from location {1, 1}; • Intermediate information cached: AAST, Static semantic information, hash information, clone table.
The Incremental Detection Algorithm • Follow the same steps as the initial detection algorithm, but reuse and incrementally update the information cached from the previous run of the clone detection. • Take a function, instead of a file, as a unit to track changes. • Track the change of clones, mark each clone class as new, unchanged, change+, changed-, or change+- .
Clone Elimination • Fully automatic clone elimination not desirable in practice. • Choice of clones to remove. • functionality of the clone needs to be examined. • the anti-unification function of a clone class, and its parameters need to be renamed. • A host module for the anti-unification function needs to be selected.
Clone Elimination with Wrangler • Copy and paste the anti_unification function to an proper Erlang module. • Modify the anti_unification function is necessary. • Rename function name. • Rename variable names. • Re-order function parameters. • Apply ‘fold expressions against a function definition’ to the new function.
SIP case study Session Initiation Protocol SIP message processing allows rewriting rules to transform messages. SIP message manipulation (SMM) is tested by smm_SUITE.erl, 2658 LOC.
Conclusions • Efficient clone detection on medium-sized projects. • Possible to improve code using these techniques, but only with expert involvement. • A mechanism for clone detection to contribute to the daily reports from incremental nightly builds; case-study for this with LambdaStream.
Future Work • To extend the tool to detect expression sequences which are similar up to insertion, or deletion of some expressions. • To check client code against libraries.
http://www.cs.kent.ac.uk/projects/wrangler/ Thank you!