150 likes | 274 Views
Code duplication detection using ASTs. Sponsored by: Terence Parr. Do Te Kien Graduate Student of University of San Francisco dtkien@usfca.edu. Introduction. Code duplication detection within a single program (not a cheater detector). Couple reasons:
E N D
Code duplication detection using ASTs Sponsored by: Terence Parr Do Te Kien Graduate Student of University of San Francisco dtkien@usfca.edu
Introduction • Code duplication detection within a single program (not a cheater detector). • Couple reasons: • Errors and bugs can be duplicated and located at more than one place. Therefore, it is difficult to detect and fix. • To maintain duplicated chunks of code is not interesting.
Examples of code duplication It doesn't matter with format characters!
Examples of code duplication It doesn't matter what variable names are!
Overview • Program reads a dir tree of Java files and prints out chains of duplicate code chunks with the line numbers and files containing those chunks. • Measure of equality: • Exact match but using normalized string based upon AST printout • "Fuzzy" match by replacing all variables with ID
Examples • Normalized string in exact match • Normalized string in fuzzy match
Demo • Exact match demo • Fuzzy match demo
Algorithm • Step 1: Use ANTLR to normalize and categorize • Step 2: Run compare on the list of statements and map the results to the diagonal matrix • Step 3: Walk on matrix to detect and collect the chains of duplication code. • Step 4: Report results to file
Diagonal Matrix There are three duplication code blocks: f1(st1->st3); f2(st1->st3); f3(st1->st3)
Discuss • How ANTLR approach improves the code duplication detection? • Speed • Not only syntactic but also semantic
Nice Words • I would like to express my gratitude to Professor Terence who inspires me to work on this project! • Thank you for your listening