Extraction of Evolution History from Software Source Code Using Linear Counting

Extraction of Evolution Historyfrom Software Source CodeUsing Linear Counting Speaker: Liu Shuchang Osaka University 1

Background existing code product variant daily software development copy edit software evolution 2

Evolution History Example only source code ❌ 3

Introduction • Evolution History Recovery • product variants • using only source code • Evolution Tree • vertex: variant • edge: derived relation (most similar pair) • key: product similarity • Previous Study • diff based (file-to-file similarity) • time needed (worst case: 2 days) • Linear Counting Algorithm • estimating instead of calculating 4

Linear Counting Algorithm Cardinality: 11 Zero: 2 Bitmap Size: 8 -8 × ln(2/8) = 11.0903 An example of the Linear Counting Algorithm 5

|A∩B| ——— |A∪B| Bit Map A∪B Bit Map A∩B Multiset A Multiset B Bit Map A Bit Map B Estimate Product Similarity bitwise operator hash function Initialization hash function Similarity: Jaccard Index LC(A∩B) continued division LC(A∪B) 6

the most similar pair |A∩B| ——— |A∪B| Variant A (Source Code) Variant B (Source Code) Initial Multiset B Initial Multiset A Evolution Tree Process Flow 1. n-gram modeling Initialization Jaccard Index 2. each line of the code Linear Counting Algorithm Initialization (A, B), (A, C), (A, D), … Prim’s Algorithm 7

Research Data A description of datasets we dealt with 8

Final Result of dataset5 The Evolution Tree we extracted (the Best Configuration) Existing actual evolution history 9

Analysis on Bitmap Size Part of the experiment results of dataset5 10

Best Configuration • Main Factors • N-gram Modeling • no (each line of code) • Bitmap Size • 128,000,000 bits • Hashing Function • MurmurHash3 • Results • Proper Edges • 86.5% (on average) • Time • 10s to 5mins 11

Contributions and Future Work • Contributions: • extract an ideal Evolution Tree efficiently • influence of various factors • best configuration • faster and showed better accuracy • Future Work • larger datasets • other programming language • solve the remaining problems 12

Extraction of Evolution History from Software Source Code Using Linear Counting

Extraction of Evolution History from Software Source Code Using Linear Counting

Presentation Transcript

Source Code Tons of Code

Source Code Analysis Using BAT

Source Code Inspection and Software Reuse

Extraction of Product Evolution Tree from Source Code of Product Variants

Source code

Using Source Code Control Effectively

Astrophysics Source Code Library 546 codes and counting…

Using Subversion for Source Code Control

Using Open Source Software

Source Code Revision Control Software

Code Evolution

Abstraction of Source Code

Software Security Without The Source Code

Information extraction from web pages using extraction ontologies

Identifying Source Code Reuse across Repositories using LCS-based Source Code Similarity

source code of netflix

Bingo Caller source code Software - Best Bingo Script and Source Code

Studying The Evolution of Software Systems Using Evolutionary Code Extractors

Exploring Software Evolution Using Spectrographs

Releasing Research Software Source Code

Information extraction from web pages using extraction ontologies

Udemy Open Source Software - Udemy Source Code