1 / 12

Extraction of Evolution History from Software Source Code Using Linear Counting

This research focuses on recovering the evolution history of software products and variants from source code. The Linear Counting Algorithm is utilized to estimate product similarity and create an Evolution Tree, highlighting the relationship between different variants. Through experimenting with datasets, the study identifies the best configuration for accurate and efficient extraction of evolution history. The findings showcase improved accuracy and speed, paving the way for future work on larger datasets and different programming languages. Speaker: Liu Shuchang, Osaka University.

ledet
Download Presentation

Extraction of Evolution History from Software Source Code Using Linear Counting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extraction of Evolution Historyfrom Software Source CodeUsing Linear Counting Speaker: Liu Shuchang Osaka University 1

  2. Background existing code product variant daily software development copy edit software evolution 2

  3. Evolution History Example only source code ❌ 3

  4. Introduction • Evolution History Recovery • product variants • using only source code • Evolution Tree • vertex: variant • edge: derived relation (most similar pair) • key: product similarity • Previous Study • diff based (file-to-file similarity) • time needed (worst case: 2 days) • Linear Counting Algorithm • estimating instead of calculating 4

  5. Linear Counting Algorithm Cardinality: 11 Zero: 2 Bitmap Size: 8 -8 × ln(2/8) = 11.0903 An example of the Linear Counting Algorithm 5

  6. |A∩B| ——— |A∪B| Bit Map A∪B Bit Map A∩B Multiset A Multiset B Bit Map A Bit Map B Estimate Product Similarity bitwise operator hash function Initialization hash function Similarity: Jaccard Index LC(A∩B) continued division LC(A∪B) 6

  7. the most similar pair |A∩B| ——— |A∪B| Variant A (Source Code) Variant B (Source Code) Initial Multiset B Initial Multiset A Evolution Tree Process Flow 1. n-gram modeling Initialization Jaccard Index 2. each line of the code Linear Counting Algorithm Initialization (A, B), (A, C), (A, D), … Prim’s Algorithm 7

  8. Research Data A description of datasets we dealt with 8

  9. Final Result of dataset5 The Evolution Tree we extracted (the Best Configuration) Existing actual evolution history 9

  10. Analysis on Bitmap Size Part of the experiment results of dataset5 10

  11. Best Configuration • Main Factors • N-gram Modeling • no (each line of code) • Bitmap Size • 128,000,000 bits • Hashing Function • MurmurHash3 • Results • Proper Edges • 86.5% (on average) • Time • 10s to 5mins 11

  12. Contributions and Future Work • Contributions: • extract an ideal Evolution Tree efficiently • influence of various factors • best configuration • faster and showed better accuracy • Future Work • larger datasets • other programming language • solve the remaining problems 12

More Related