140 likes | 387 Views
2. Clone Detection Overview. Clone = duplicate codeClone detection = static code analysis with goal of locating clonesHelps to ensure product is maintainableIncrease reuseDecrease complexityParticularly useful with large codebase. 3. Features/Goals of Tools. Find clonesType 1
E N D
1. 1 OMSE 535 Project 1 Presentation Clone Detection Tools
Kevin Bates
2. 2
3. 3 Features/Goals of Tools Find clones
Type 1 Exact copy
Type 2 Syntactically identical copy
Type 3 A copy with further identifications
Different clone detection methods
Textual comparison
Token comparison
Metric comparison
Program dependency graph comparison
4. 4 General Process Using Tools Select/identify codebase
Configure tool
Different options by tool
Very helpful in obtaining better results
Execute the tool
Interpret the results
Tune the tool and Repeat
5. 5 Duplo Summary Results Summary information from Duplo and Simian was very similar though not always reliable.
Prior to tuning, Duplo reported 400% duplicate code.Summary information from Duplo and Simian was very similar though not always reliable.
Prior to tuning, Duplo reported 400% duplicate code.
6. 6 Duplo Detailed Results A lot of results come back for simple comparisons like these above, which are common when dealing with a family of COM classes. This is not valuable feedback.
Using a token comparison tool as opposed to a text comparison tool might produce better results.
Simian produced similar results, but without the corresponding code snippet. That makes the tool more difficult to use, though it integrates with Eclipse (Java IDE), and that may provide better support.A lot of results come back for simple comparisons like these above, which are common when dealing with a family of COM classes. This is not valuable feedback.
Using a token comparison tool as opposed to a text comparison tool might produce better results.
Simian produced similar results, but without the corresponding code snippet. That makes the tool more difficult to use, though it integrates with Eclipse (Java IDE), and that may provide better support.
7. 7 Better Duplo Results? Here is a text comparison that found a type 1 clone in three different files.
This could use some refactoring.Here is a text comparison that found a type 1 clone in three different files.
This could use some refactoring.
8. 8 CCFinder Scatter Plot Horrible user interface for a large number of files. Were working with more than a thousand files which are plotted against each other here.
Might be a practical UI for a small set of files.
The tool does let you refine your set to a subset of the files, but deciding which files to reduce your set to is not great.Horrible user interface for a large number of files. Were working with more than a thousand files which are plotted against each other here.
Might be a practical UI for a small set of files.
The tool does let you refine your set to a subset of the files, but deciding which files to reduce your set to is not great.
9. 9 CCFinder Source Compare Good news: CCFinder found some type 2 clones
Bad news: interpreting the results is still difficult. In this case, it is doing a side-by-side comparison of the same file.
Good news: CCFinder found some type 2 clones
Bad news: interpreting the results is still difficult. In this case, it is doing a side-by-side comparison of the same file.
10. 10 Benefits (Pros) Facilitates reducing duplicate code
Different comparison techniques can be used to locate different types of clones
Most tools can compare large codebases
My tests used a code base of over 1000 files and 140 KLOC
11. 11 Drawbacks (Cons) Tools are difficult
Require tuning to obtain usable results
Results are not often easy to interpret
Performance problems
Studies have shown that tools dont do a real good job of finding Type 3 clones
No one tool really does the job by itself
Somewhat limited language support Have not verified execution time impact
Have not verified execution time impact
12. 12 Availability and Cost Some are experimental and appear not to be available for general use (i.e. Dup, Duplix)
Most others free for download
Labs
Universities
Some have a cost for commercial use (i.e. Simian)
13. 13 Summary of Product Evaluation Still emerging technology
Difficult to use, tune, interpret
Can be valuable if you work with it
14. 14 Tool Links and References Bellon et al, Comparison and Evaluation of Clone Detection Tools, IEEE Transactions on Software Engineering, Vol 33, No. 9, September 2007. http://portal.acm.org/citation.cfm?id=1314037.1314085&coll=&dl=GUIDE&CFID=15151515&CFTOKEN=6184618
http://www.cis.uab.edu/tairasr/clones/literature/
Some tools:
CCFinder Token http://www.ccfinder.net
Duplo Text http://sourceforge.net/projects/duplo/
Simian Text http://www.redhillconsulting.com.au/products/simian/index.html
Duploc Text http://www.iam.unibe.ch/~scg/Research/Duploc/index.html