100 likes | 194 Views
Towards an Analysis of Who Creates Clone and Who Reuses it. ○ Takuya Moriwaki † Hiroshi Igaki † Yuki Yamanaka † Norihiro Yoshida †† Katsuro Inoue † Shinji Kusumoto †. † Graduate School of Information Science and Technology, Osaka University †† Nara Institute of Science and Technology.
E N D
Towards an Analysis of Who Creates Clone and Who Reuses it ○Takuya Moriwaki†HiroshiIgaki†YukiYamanaka† NorihiroYoshida††KatsuroInoue†ShinjiKusumoto† †Graduate School of Information Science and Technology,Osaka University ††Nara Institute of Science and Technology
Background • Source CodeReuse can be a reasonable decision[1] • Can improve productivity and reliability of software development • However, Code Reuse is considered difficult[2] • how developers reuse existing source code • Sojer reported a survey-based empirical study in OSS development[3] • Developers who believe benefits of reuse rely more existing code • Developers with experience in a greater number of projects reuse existing code more • These results motivated us to quantitatively clarify individual developer differenceson source code reuse with code clone detection [1] C. Kapser, M. W. Godfrey. “Cloning considered harmful” considered harmful: patterns of cloning in software. Empirical Software Engineering 13(6):645–692, December2008. [2] W. Tracz. Confessions of a used-program salesman: Lessons learned. In Proceedingsof the 1995 Symposium on Software Reusability, SSR ’95:11–13, 1995. [3]M. Sojer, J. Henkel. Code Reuse in Open Source Software Development: QuantitativeEvidence, Drivers, and Impediments. Journal of the Association for InformationSystems 11(12):868–901, 2010.
Objectives • Analyze repositories across multiple projects • Developers may join multiple projects • Investigate about reuse behaviors of each developer • Who reuse existing code and who originally implemented reused code ProjectB ProjectA Organization Developer A Developer A clone implement Developer C clone (User) (Author) (User) Developer B Developer B
Overview of Our Reuse Analysis STEP2:Derive Clone Genealogies STEP1: Repository Concatenation Input:Composite Repository Output: Clone Genealogies Input:multiple repositories Output: CompositeRepository STEP3: Extract individual reuse behaviors rev. i+3 rev. i+3 rev.i+1 rev. i+1 rev. i+2 rev. i+2 rev. i rev. i Input: Clone Genealogies Output:Clone set Author and ClonesetUsers
STEP1: RepositoryConcatenation t1 t3 Repository A Rev. A1 Rev.A2 a1’.java Rev.B1 a1.java Rev.B2 t2 t4 Repository B b1.java b1.java Every revision is merged sequentially b2.java time Composite Repository t4 t3 t2 t1 Rev. 4 Rev. 3 Rev. 1 Rev. 2 a1.java a1.java a1’.java a1’.java b1.java b1.java b1.java b2.java we can check-out files from multiple repositories
STEP2: Derive Clone Genealogy • DeriveCloneGenealogies from therepository[4] • Three patterns; “Same”, “Add”, and “Subtract” are used to identify reuse • By detecting the evolution pattern “Add”, reuse behavior is extracted • The target is type-2 clones Overlappingrelation Repository Clone Set in rev. i+1 clone rev. i+3 rev. i+1 rev. i+2 rev. i Evolution Patterns Same Add Subtract [4] M. Kim, V. Sazawal, D. Notkin, G. Murphy. An Empirical Study of Code Clone Genealogies. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ESEC/FSE-13, pp. 187–196. 2005.
STEP3: Identify Clone set Author • By using the blame command in VCSto identify reuse behaviors • who implementedsource code, and when • Detect clone set authors and clone set users Clone set Author:Developer who implemented code snippet which was developed first in a clone genealogy Developer A A:2013/1/1 A:2013/1/1 A:2013/1/1 A:2013/1/1 B:2013/2/2 B:2013/2/2 B:2013/2/2 B:2013/2/2 clone C:2013/3/3 Developer B Developer C rev. i+3 rev. i+1 rev. i+2 rev. i Clone Set User: Developer who reusedcode snippet Clone Set User
Pilot Case Study Derive clone genealogies by using CCfinder eclipse.pde 156 eclipse.platform.text 5 Total Inter-project Identify clone setauthors and users
Result • Some active reuse developers have their code frequently reused, while there exist developers who never reuse the code • Though #commits by developer A is more than twice than B, #reuse by A are less than the number of them by B Developer Developers who committed below 10 times were omitted This result shows there exist little relationships among #commit, #creation, and #reuse
Conclusion • We proposed source code reuse analysis over multiple projects taking into consideration difference among developers • We conducted a pilot case study • Currently, a larger case study is being performed • We are planning to create a system which promotes source code reuse • calculate Impact Factoron source code reuse • visualize reuserelationships reuse Code Clone Developer A inform analysis result commit Developer B analyze Reuse Promoting System Repository