100 likes | 262 Views
Finding File Clones in FreeBSD Ports Collection. Yusuke Sasaki Tetsuo Yamamoto Yasuhiro Hayase Katsuro Inoue. File Clones. Two or more files with the same content Comments and code indentation ignored Inside a project or between different projects Research about file-clones is scarce
E N D
Finding File Clones in FreeBSD Ports Collection Yusuke Sasaki Tetsuo Yamamoto Yasuhiro Hayase Katsuro Inoue
File Clones • Two or more files with the same content • Comments and code indentation ignored • Inside a project or between different projects • Research about file-clones is scarce • Get new knowledge about file-clones Project A Project B int main() { printf(“Hello msr!”); return 0; }
FCFinder • Input • .c and .h files • Output • File-clone sets • Faster than other tools • Detection • Tokenization • MD5 Hash Calculation • Exact Matching
Experiment • Target • Only .c and .h files inthe FreeBSD Ports Collection • ~1.4M files • ~12 GB • 17.16 hours • We measured: • File size • Number of files in each project • Size of each file-clone set • Number of file-clones in a project These values follow the power law
File-clone Set Size Left:used in PHP5 Right:used in PHP4 used in both of PHP4 and 5 D E L:650 sets R:500 sets 419 sets 120 file clones 100 5 10 50 L:61 file clones R:59 file clones file clone set size R*2 =0.8508
File-clones per Project Right:PHP4 modules Center:projects related bin-utils Left:PHP5 modules G 5 10 50 100 500 1K 5K 10K number of file clone sets R*2 =0.8263
File-clones Between Projects (1/3) * Nodes show the projects * Edges between projects show the number of file clones between two projects • Ex) gcc41 and gfortran shares 7691 file clones
File-clones Between Projects (2/3) * Nodes show the projects * Edges between projects show the number of file clones between two projects
File-clones Between Projects (3/3) * Nodes show the projects * Edges between projects show the number of file clones between two projects
Conclusions & Future Work Conclusions • Measured several features of the FreeBSD Ports collection. • Found that the measured features follow the power law Future Work • Projects logical coupling investigation