290 likes | 416 Views
Revision Control System Using Delta Script of Syntax Tree. Yasuhiro Hayase Makoto Matsushita Katsuro Inoue Graduate School of Information Science and Technology, Osaka University, Japan. Contents. Revision Control System Problem on Merging the Source Codes Research Goal Merging the Trees
E N D
Revision Control SystemUsing Delta Script of Syntax Tree Yasuhiro Hayase Makoto Matsushita Katsuro Inoue Graduate School of Information Science and Technology, Osaka University, Japan
Contents • Revision Control System • Problem on Merging the Source Codes • Research Goal • Merging the Trees • Step 1. Converting the Source Code into a Tree • Step 2. Computing Delta of the Trees • Step 3. Merging • Implementation of the System • Experiments • Conclusion and Future Work
Open Source Software Development Increasing attention on the open-source development. Developers are using the following tools. • Revision Control System • Storing the history of the source codes and the documents through the development process. • Example: CVS, Subversion … • Mailing List • Developers and users discuss using Mailing Lists. • Bug-Tracking System
Merging on Parallel Development X2 X3 X Check out Repository X1 X Edit Check in Developer A X1 Check out Check in Check out the newest version (= X1) Developer B The modification of Developer A will be lost if X2 will be checked in. Merging by Revision Control System X X2 X3 Edit
Problems • The existing revision control systems used in open-source development merge the files line-by-line. • The line-by-line merging sometimes generates inaccurate outputs when applied to source code: • Detecting false conflicts when the same line is changed by both developers. • Overlooking real conflictswhen the changes are occur in different lines. If the system fails in merging the two files, the developers have to fix it.
Problem 1. False Conflict • Developer A and B are editing working copies of the same file concurrently. • If developers changed the same line, the revision control system detects a conflict. • But changes to the same line might not always conflict, they can be compatible. Developer A Fails in merging int refs=0; int refs; int refs=0; /* reference count */ int refs; /* reference count */ Developer B
Problem 2. Overlooking Conflict • Developer A and B are editing working copies of the same file concurrently. • If developers do not change the same line, the revision control system does not detect conflict. • But changes to different lines may conflict. Developer A int num, sum; int num, sum; : avg = sum/num; int num, sum, avg; int num, sum, avg; : avg = sum/num; Illegal merging output Developer B
Our Research Goal Build an intelligent merging system and reduce the load on the developers. • Avoiding false conflict on merging. • Finer grained merging. • Reducing problems caused by merging. • Checking that the use of a variable corresponds to its declaration. • Allowing the developers to keep their working habits. • The developers can use arbitrary editor to edit source codes. • Usability of the new system should be similar to the existing systems.
Contents • Revision Control System • Problem on Merging the Source Codes • Research Goal • Merging the Trees • Step 1. Converting the Source Code into a Tree • Step 2. computing Delta of the Trees • Step 3. Merging • Implementation of the System • Experiments • Conclusion and Future Work
Merging Source Codes Recognizing Tree Structure • Difference Computation and Merging of Tree Structure Step 1. Analyze the source codes and convert it to trees. Step 2. Compute the delta of the trees. Step 3. Apply the delta to the target tree. Origin of Delta Computation Destination of Delta Computation Source Code Source Code Delta Source Code Source Code Target
Step 1. Source Code Conversion • The source code is parsed and an augmented parse-tree is built • The tree includes white-space and comment nodes • Each node has a string value • A unique ID is assigned to each node: the current tree is compared with the previous version of the tree stored in the repository • If corresponding node exists, same ID is assigned • Otherwise, new unique ID is assigned • Each node corresponding to the use of a variable is linked to the node corresponding the declaration of that variable 1 Block { int i; i; } 2 { 3 <WS> 4 Declare 9 <WS> 10 Statement 13 <WS> 14 } 5 int 6 <WS> 7 i 8 ; 11 i 12 ;
Step 2. Delta Computation Delta of two trees is computed • Editing Operation • Insertion of a node: insert(NewID, String, ParentID, Index) • Deletion of a leaf node: delete(ID) • Updating of the node’s string: update(ID, NewString) • Moving a sub-tree: move(ID, ParentID, Index) • Editing Script • A sequence of editing operations • Represents all the operations needed to transform a tree A into a tree B 1 Block insert(10, Declare, 1, 0) delete(8) 2 Declare 10 Declare 2 Declare 6 Statement update(3, long) move(2, 1, 0) 3 long 4 i 5 ; 3 long 3 int 4 i 5 ; 7 i 8 <WS> 9;
Editing Script The differences between the tree A and the tree B are expressed by the editing script. When determining the editing script, we must care to not include unnecessary operations. • Assign a cost to each editing operation. • Define the cost of the editing script as the sum of the cost of each editing operations. • Minimize the editing script cost. An extended version of the existing approximate algorithm FMES is used to compute the delta between the trees. * S. S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom.Change detection in hierarchically structured information.In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 493–504, 1996.
Delta Computation Algorithm • Cost of the editing operations • insert = delete = move = 1 • update = from 0 to 2 it depends on the value of string before and after the update operation: 2*(1 – 2 * length(LCS(before, after))/(length(before)+length(after))) • Algorithm • Determine the couples of matching nodes • Leaf nodes: string similarity. • Inner nodes except for identifier nodes: match ratio of leaf nodes. • Identifier nodes: exact same string or matching of the descendent nodes. • Build the editing script
Example: Delta Computation delete(4) 0 Block ? Block delete(3) 0 Block insert(5, if, 0, 1) 5 if ? if insert(6, then, 5, 0) move(1, 6, 0) 1 doA 3 doB 6 then ? then 2 x 4 y ? doA 1 doA 0 Block ? x 2 x 1 doA 3 doB 5 if 2 x 4 y 6 then 1 doA 2 x
Step 3. Merging The editing script for converting tree A to tree B is applied to tree C. Problem: For some operation in the editing script there may not be a corresponding node in the tree C. If no node with a matching ID is present in the tree C, a similar node is searched. Similarity is based on: • Matching of the parent node or sibling nodes • Similar string If a suitable node is found, replace the original ID in the editing script with the ID of node found.
C1 C2 0 if 0 if 1 then 4 else 1 then 4 else 8 doC 2 doA 8 doC 8 doA 9 z 9 z 3 x 3 x Example of Merging A B 0 if 0 if update(6, i) move(5, 1, 0) delete(4) 1 then 1 then 4 else 5 doB 5 doB 2 doA 2 doA 6 i 3 x 3 x 6 y No node can be substituted Node 8 is similar a bit. Building two trees one with the operation applied to node 8, and one without the operation applied C D2 D3 0 if D1 update(6, i) move(8, 1, 0) delete(4) update(6, i) move(5, 1, 0) delete(4) update(6, i) move(5, 1, 0) delete(4) 0 if 0 if 0 if 1 then 1 then 4 else 1 then 1 then 4 else 8 doC 2 doA 2 doA 2 doA 8 doC 2 doA 8 doC Node 4 has a child node in tree C2. Building both of trees to which the operation is not applied and sub-tree whose root is node 4 is deleted. 9 z 9 z 3 x 3 x 3 x 3 x 9 z The developer selects one of them
Contents • Revision Control System • Problem on Merging the Source Codes • Research Goal • Merging the Trees • Step 1. Converting the Source Code into a Tree • Step 2. Computing Delta of the Trees • Step 3. Merging • Implementation of the System • Experiments • Conclusion and Future Work
System Implementation The implementation of our system is based on the existing revision control system subversion. • Client-server system • The delta computation and the merge operations are made on the client side. • Target Programming Language is Java. • Repository stores the augmented parse trees instead of the raw source files. • The tree is stored in XML format.
System Overview subversion Client Developer Delta Computation subversion Server Delta Application Repository XML Merging Converting between source code and XML Mutual Conversion XML and source code Node Matching
Check-in and check-out Source Code subversion Client Developer Delta Computation Dataflow on Check-out subversion Server Dataflow on Check-in Delta Application Repository Source code Original XML File XML File with Node ID Edit Edited source code XML File without Node ID Mutual Conversion XML and source code Node Matching
Sorted source codes as merging result Sorted XML Files as merging result マージ結果 のXML マージ結果 のXML マージ結果 のXML マージ結果 のXML マージ結果 のXML マージ結果 のXML Merging Developer Dataflow on Merging subversion Client The Newest Version of XML File Delta Delta Computation subversion Server Delta Application Repository Original XML File XML File with Node ID Offer them to Developer Edited source code XML File without Node ID Mutual Conversion XML and source code Node Matching
Contents • Revision Control System • Problem on Merging the Source Codes • Research Goal • Merging the Trees • Step 1. Converting the Source Code into a Tree • Step 2. Computing Delta of the Trees • Step 3. Merging • Implementation of the System • Experiments • Conclusion and Future Work
Experiment 1 class C { double num, sum, avg; … } Checking the proper functionality of the system with a trivial test case • A small source file has been written. (Original) • From Original, three variants have been derived: Variant 1: The variable avg has been deleted. Variant 2: A method accessing the variable avg had been added Variant 3: The variable avg has been renamed to average • The deltas between Original and each of the three variants has been computed (Delta 1…3) • Apply each Delta to each Variant. class C { double num, sum; … } class C { double num, sum, avg; … m() { … avg … } … } class C { double num, sum, average; … }
Result of Experiment 1 Our Algorithm Line-by-line Merging Our algorithm gave correct a result in 5 out of 6 cases. In just one case our algorithm failed to search a valid substitute node and generated too many candidates.
Experiment 2 Evaluating the efficiency of the algorithm at actual software development • Two open source projects has been selected as test cases: • Jakarta Project (22,606 files, 162,683 revisions) • Eclipse Project (19,420 files, 103,358 revisions) • 84 pairs of check-inswhere merge occurred have been identified. • The line-by-line merging and our algorithm have been compared.
Result of Experiment 2 • Our algorithm succeeded in the cases in which line-by-line merging succeeded. • Our algorithm also succeeded in 9 of the 13 cases in which line-by-line merging failed.
Result of Experiment 2:Detail when line-by-line merging failed 3 of the 4 cases in which our algorithm failed are real conflict. But in another one case. our algorithm failed to find substitute nodes and positions, and generated too many candidates. And in one case in which our algorithm succeeded, many candidates are generated also.
Conclusion and Future Work • Summary of this presentation • Problems on existing revision control systems used in open source development. • Syntactic merging of source code as solution. • Implementation of the system. • Two evaluations. • Future work • Improving the precision of the search algorithm • Improving user interface for selecting merging result Highlight the differences between the candidates. • Making inter-file link