1 / 14

OMSE 535 Project 1 Presentation

2. Clone Detection Overview. Clone = duplicate codeClone detection = static code analysis with goal of locating clonesHelps to ensure product is maintainableIncrease reuseDecrease complexityParticularly useful with large codebase. 3. Features/Goals of Tools. Find clonesType 1

Anita
Download Presentation

OMSE 535 Project 1 Presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. 1 OMSE 535 Project 1 Presentation Clone Detection Tools Kevin Bates

    2. 2

    3. 3 Features/Goals of Tools Find clones Type 1 Exact copy Type 2 Syntactically identical copy Type 3 A copy with further identifications Different clone detection methods Textual comparison Token comparison Metric comparison Program dependency graph comparison

    4. 4 General Process Using Tools Select/identify codebase Configure tool Different options by tool Very helpful in obtaining better results Execute the tool Interpret the results Tune the tool and Repeat

    5. 5 Duplo Summary Results Summary information from Duplo and Simian was very similar though not always reliable. Prior to tuning, Duplo reported 400% duplicate code.Summary information from Duplo and Simian was very similar though not always reliable. Prior to tuning, Duplo reported 400% duplicate code.

    6. 6 Duplo Detailed Results A lot of results come back for simple comparisons like these above, which are common when dealing with a family of COM classes. This is not valuable feedback. Using a token comparison tool as opposed to a text comparison tool might produce better results. Simian produced similar results, but without the corresponding code snippet. That makes the tool more difficult to use, though it integrates with Eclipse (Java IDE), and that may provide better support.A lot of results come back for simple comparisons like these above, which are common when dealing with a family of COM classes. This is not valuable feedback. Using a token comparison tool as opposed to a text comparison tool might produce better results. Simian produced similar results, but without the corresponding code snippet. That makes the tool more difficult to use, though it integrates with Eclipse (Java IDE), and that may provide better support.

    7. 7 Better Duplo Results? Here is a text comparison that found a type 1 clone in three different files. This could use some refactoring.Here is a text comparison that found a type 1 clone in three different files. This could use some refactoring.

    8. 8 CCFinder Scatter Plot Horrible user interface for a large number of files. Were working with more than a thousand files which are plotted against each other here. Might be a practical UI for a small set of files. The tool does let you refine your set to a subset of the files, but deciding which files to reduce your set to is not great.Horrible user interface for a large number of files. Were working with more than a thousand files which are plotted against each other here. Might be a practical UI for a small set of files. The tool does let you refine your set to a subset of the files, but deciding which files to reduce your set to is not great.

    9. 9 CCFinder Source Compare Good news: CCFinder found some type 2 clones Bad news: interpreting the results is still difficult. In this case, it is doing a side-by-side comparison of the same file. Good news: CCFinder found some type 2 clones Bad news: interpreting the results is still difficult. In this case, it is doing a side-by-side comparison of the same file.

    10. 10 Benefits (Pros) Facilitates reducing duplicate code Different comparison techniques can be used to locate different types of clones Most tools can compare large codebases My tests used a code base of over 1000 files and 140 KLOC

    11. 11 Drawbacks (Cons) Tools are difficult Require tuning to obtain usable results Results are not often easy to interpret Performance problems Studies have shown that tools dont do a real good job of finding Type 3 clones No one tool really does the job by itself Somewhat limited language support Have not verified execution time impact Have not verified execution time impact

    12. 12 Availability and Cost Some are experimental and appear not to be available for general use (i.e. Dup, Duplix) Most others free for download Labs Universities Some have a cost for commercial use (i.e. Simian)

    13. 13 Summary of Product Evaluation Still emerging technology Difficult to use, tune, interpret Can be valuable if you work with it

    14. 14 Tool Links and References Bellon et al, Comparison and Evaluation of Clone Detection Tools, IEEE Transactions on Software Engineering, Vol 33, No. 9, September 2007. http://portal.acm.org/citation.cfm?id=1314037.1314085&coll=&dl=GUIDE&CFID=15151515&CFTOKEN=6184618 http://www.cis.uab.edu/tairasr/clones/literature/ Some tools: CCFinder Token http://www.ccfinder.net Duplo Text http://sourceforge.net/projects/duplo/ Simian Text http://www.redhillconsulting.com.au/products/simian/index.html Duploc Text http://www.iam.unibe.ch/~scg/Research/Duploc/index.html

More Related