540 likes | 687 Views
Analyzing differences between W1 and GDLs using tree alignment. Morten Rhiger (The IT-University of Copenhagen). Outline. The problem (The upgrade problem: migrating partner customizations from version N to version N+1 ) Our solution ( Daisychaining procedures, a l á AOP)
E N D
Analyzing differences between W1 and GDLs using tree alignment MortenRhiger (The IT-University of Copenhagen)
Outline • The problem(The upgrade problem: migrating partner customizations from version N to version N+1) • Our solution(Daisychaining procedures, a lá AOP) • Other solutions(Repositories with versioning, software merging, …) • Validating our solution(Measuring the number of good customizations using a tree diff) • Numbers…
NAV lifecycle W1version 5.0 Partners customize DEversion 5.0 GBversion 5.0 BEversion 5.0 DKversion 5.0 Microsoft evolve W1version 2009 DKversion 2009 Time
The NAV upgrade problem • There are no language features for controlling customization in NAVs C/AL • Customizations are (destructive) source-code modifications • There is no versioning in NAV • Some (clever) partners maintain repositories of their edits • www.mergetool.com • On the other hand, few (VAR) partners are IT professionals • Consequently, partners face a serious problem whenmigrating their old customization to the new version • Migration takes up to 30% the effort required to implement the first derived version
Our solution • Distinguish between • The location of a customization in the original version (a customization point), and • the modification a customization performs • (Reminiscent of AOP)
NAV lifecycle W1version 5.0 Modifying customizations, but leaving customization points unchanged. DKversion 5.0 Moving around customization points. Pluggin old customization into (possibly moved) customization points (trivial). W1version 2009 DKversion 2009
Customization points where? W1 version 5.0 DE version 5.0 PROCEDURE UpdateBalance();BEGINGenJnlManagement.CalcBalance(…);END PROCEDURE UpdateBalance();BEGINGenJnlManagement.CalcBalance(…);<<customization point>> END PROCEDURE UpdateBalance();BEGINGenJnlManagement.CalcBalance(…);TotalPayAmount := 0;TempGenJourLine.COPY(Rec);END W1 version 2009 Legal? PROCEDURE UpdateBalance();BEGIN<<customization point>>GenJnlManagement.CalcBalance(…);END
Customization points where? • When is it legal to move a customization point? Where can it be moved to? … • Procedure calls are useful customization points (we hypothesize): • If a procedure call can be moved, so can customizations a that point • We probably still need something more fine grained (we also hypothesize)
Daisy-chaining procedures • Daisy-chaining procedures and triggers (a proposal due to Lars) • Reminiscent of aspect-oriented programming • A property (Trigger) on a procedure or trigger controls what is (also) invoked when that procedure is called
Daisy-chaining procedures • Existing procedure and trigger property: [Trigger(“*”)] PROCEDURE Foo(…) = … • Adding code to execute at the end of Foo: PROCEDURE FooMorten() = … • The “*” says that after Foo is invoked, all procedures with prefix Foo should also be invoked (in some unspecified order) • Resolved “late”
Customization points where? W1 version 5.0 DE version 5.0 [Trigger(“*”)]PROCEDURE UpdateBalance();BEGINGenJnlManagement.CalcBalance(…);END PROCEDURE UpdateBalanceDE();BEGINTotalPayAmount := 0;TempGenJourLine.COPY(Rec);END W1 version 2009 The late partner (here GDL) has decided that the new code should be invoked whenever UpdateBalance is invoked, after the original.The early partner (here Microsoft) is free to modify the body of the procedure. [Trigger(“*”)]PROCEDURE UpdateBalance();BEGIN …GenJnlManagement.CalcBalance(…); …END There is an understanding that the calls to UpdateBalance are the relevant customization point for the Germain customization.
Other Trigger properties • Daisy chaining: [Trigger(“*”)] PROCEDURE Foo() = … [Trigger(“*”)] PROCEDURE FooMorten() = … PROCEDURE FooMortenMore() = …Invoking Foo also invokes FooMorten and FooMortenMore (in that order).
Other Trigger properties • “Hijacking” (or replacing) a procedure: [Trigger(“Other”)] PROCEDURE Foo() = … PROCEDURE Other() = …Calls to Foo discards the body of Foo and executes Other instead.
Other Trigger properties • Dynamic dispatch: [Trigger(“=Dispatch”)] PROCEDURE Foo() = … PROCEDURE Dispatch() = …Calls to Foo invokes Dispatch, to produce the string controlling the trigger. For example, PROCEDURE Dispatch() = RETURN “*”; or PROCEDURE Dispatch() = RETURN “Other”; oreven PROCEDURE Dispatch() = RETURN “=NewDispath”;
Evaluating the proposal • Benefits • Little or no new C/AL syntax required • A class of existing customization can be handled without modifying the corresponding W1 • Drawbacks • Probably not flexible enough • Editing experience messed up
More fine-grained customizations • Inserting new customization point (procedure calls) in W1:PROCEDURE Foo() = PROCEDURE Foo() = A(); A(); B(); <<customization>> B();-------------------------------------------------PROCEDURE Foo() = PROCEDURE InFooMorten() = A(); <<customization>>InFoo(); B();[Trigger(“*”)]PROCEDURE InFoo(); • The need for a customization point must be passed back through the chain of developers
Goals … • … to measure how well existing customization fit the model • … to testdrive our analysis tool (currently) • A tree-diff enginge discovering tree alignments • Test data: • W1 5.0 SP1, and • 39 GDLs of the same version: DK 5.0 SP1, … • ... to make the tool available for other analyses (long term)
Sequence-based diff source code? • Traditional sequence-based diff (e.g., UNIX diff) does not take program structure into account • Valid for software merging (e.g., UNIX diff3) • Not appropriate for identifying whole-statement modifications:IF X = 0 THEN BEGIN IF X = 0 THEN BEGINFoo(); Foo(); Bar(); END ELSE BEGIN // addEND; Bar(); END
Tree-based diff? • Yes, but what is a tree-based diff? • Preserve depth? • Allow general movements? • Allow re-ordering siblings? • … • We propose a tree alignment for ordered trees [Jiang,Wang,Zhang CPM’94] as an appropriate way to identify customizations.
Tree alignment • A tree alignment A of two trees T, U is a tree whose nodes are pairs on form (t, u) (t, -) (-, u) (copy node) (delete node) (insert node)where t, u are nodes from T, U, and that satisfying an erasure property: discarding the second components and removing “-” nodes and their paths gives the original T, and (vice versa) removing the first compont gives the original U.
Tree alignment • A tree alignment for ordered trees • … does not preserve depth, • … does not allow re-ordering of siblings, • … does not allow general movements of subtrees • From the alignment, an edit script (similar to the output of UNIX diff) can be generated • Interactive examples…
Tree alignment algorithm • Dynamic programming for sequence-based diff: • Dynamic programming for tree alignment • More “complicated” • More complex: O(|T|×|U|×(deg(T)+deg(U))2) time complexity Minimum cost = edit script
Sizes of code pieces • Code piece = procedure or trigger • Code pieces are uniquely identified by a code path, e.g., • Table/317/FIELDS/0/OnValidate • Codeunit/530/CODE/ValidateEnumVal • Form/31/CONTROLS/4/Menu/MENUITEMS/2/OnPush • Code size measured in AST nodes (≈ number of statements)
W1 5.0 SP1 numbers • 39,946 code pieces: • 45% has 3 statements or less, • another 30% has 4-10 statements, • yet another 16% has 11-30 statements. • 357,713 statements: • 8% are in code with 3 statements or less, • another 17% are in code with 4-10 statements, • Yet another 18% are in code with 11-20 statements. • Roughly the same numbers for GDLs. • The complexity of the tree alignment algorithm is under control (a W1-GDL diff takes 14-18 minutes on my laptop)
W1 5.0 SP1 numbersDetails • Four code pieces has more than 1,000 statements: • Codeunit 80 “Sales-Post” PROPERTIES/OnRun (1462 statements, nontrivial) • Codeunit 90 “Purch.-Post”PROPERTIES/OnRun (1492 statements, nontrivial) • Report 83 “Change Global Dimensions” CODE/ChangeGlobalDim (1751, trivial code duplications) • Codeunit 406 “Setup Checklist Management”CODE/TransferContents (2033, trivial code duplications)
Amount of customizationDetails • Much variance: • 91 very mild customizations in IS 5.0 SP1 • 2,593 customizations in TH 5.0 SP1 • Some agreement, too: • 2,593 customizations in all of APAC, ID, MY, PH, SG, and TH • Same for {GB, IE}, {NA-US, NA-USCA, NA-USCAMX}, and {DE, AT} • Not a coincidence: These versions differ only in language • (But gives a “Proof of concept”)
Customization point usage(Hotspots) 4005 2053 1125 Probably false positives due to hotfix Cold spots Hotspots
Customization point usage(Hotspots, details) • Many cold customization points used by only one (4000), two (2000), or three (1000) GDLs. • A nontrivial number of customization points (42) used by all GDLs! • (Consistent renamings of, e.g., “.name” to “.Name”) • Probably a hotfix not captured in the repository • (But gives a “proof of concept”)
Hot objects Cold objects Hot objects
Hot objects Object Number of GDLs customizing objectCodeunit/2 30 Codeunit/11 31 Codeunit/80 31 Table/39 32 Table/37 33 Table/38 33 Table/36 36 Codeunit/12 37 Table/81 37 Codeunit/1 39 Codeunit/424 39 Codeunit/5054 39 Codeunit/5300 39 Codeunit/7152 39 Codeunit/99008517 39 Report/99008512 39
Classes of customization by version False positives (due to measuring)
Example modificationsModification that should be avoided! • Codeunit/80/CODE/FillInvPostingBuffer • W1 5.0 SP1:InvPostingBuffer[1]."Line Discount Amount" := "Line Discount Amount";InvPostingBuffer[1]."Inv. Discount Amount" := "Inv. Discount Amount"; • TH 5.0 SP1:InvPostingBuffer[1]."Inv. Discount Amount" := "Inv. Discount Amount";InvPostingBuffer[1]."Line Discount Amount" := "Line Discount Amount";
Example modificationsModification that could be avoided • Table/4/CODE/InitRoundingPrecision • W1 5.0 SP1:"Unit-Amount Rounding Precision" := 0.00001 • ES 50 SP1:"Unit-Amount Rounding Precision" := 0.000001
Example modifications • Table/14/FIELDS/5703/OnValidate • W1 5.0 SP1:BEGINPostcode.ValidateCity(City, "Post Code");END • ES 5.0 SP1:BEGINPostcode.ValidateCity(City, "Post Code", County);END • APAC 5.0 SP1:BEGINPostCodeCheck.ValidateCity(CurrFieldNo, DATABASE::Location, Rec.GETPOSITION, 0, Name, "Name 2", Contact, Address, "Address 2", City, "Post Code", County, "Country/Region Code");END • Candidate for hijacking
Example modifications • Codeunit 99000889/CODE/SetSalesHeader • W1 5.0 SP1:REPEAT SalesLine.NEXT = 0 BEGIN "Entry No." := SalesLine."Line No.“ TransferFromSalesLine(SalesLine) SalesLine.CALCFIELDS("Reserved Qty. (Base)") ...END • APAC 5.0 SP1:REPEAT SalesLine.NEXT = 0 BEGIN IF SalesLine."Build Kit" THEN TransferFromKitSalesLine(SalesLine,OrderPromisingLine) ELSE BEGIN "Entry No." := SalesLine."Line No.“ TransferFromSalesLine(SalesLine) "Source Sub Line No." := 0 SalesLine.CALCFIELDS("Reserved Qty. (Base)") ... ENDEND
Using tree alignment for C/AL source code
Identifying code modifications • Operations that can be applied to the old document: • Here (and elsewhere): delete(L1), add(L2), update(L1, L2), copy(L1) • These operations have costs • An edit script is a sequence of operations transforming the old document into the new. • An optimal edit script is one with least cost
Finding optimal edit scripts • In revision control systems, • for merging code: UNIX diff (sequence based) • In bioinformatics, • for globally aligning protein sequences (sequence based), • for comparing RNA secondary structure (tree based) • In (semi-) structured data models, • for comparing XML documents, etc
Assigning costs to edit operations • Costs for deleting an old node, adding a new node, and updating an old node with the label of a new. • Costs with the right properties give rise to a distance between two trees (in a certain metric space): D(x,y) ≥ 0D(x,y) = 0, only if x = yD(x,y) = D(y,x)D(x,z) ≤ D(x,y) + D(y,z)
Which edit costsgives “best” edit scripts? • High costs for updates, low costs for adds and deletes: • Pro: Doesn’t equate unrelated statements • Cons: Fails to detect actual updates • Low costs for updates, high cost for adds and deletes • Pro: Detects actual updates • Cons: Equates unrelated statements
High costs for updates • codeunit 73/properties/OnRun from W1-5.0 SP1:...PurchHeader.TESTFIELD(Status,PurchHeader.Status::Open);FromBOMComp.SETRANGE("Parent Item No.","No.");NoOfBOMComp := FromBOMComp.COUNT; IF NoOfBOMComp = 0 THEN ERROR(Text001, "No."); Selection := STRMENU(Text005,2); ... • codeunit 73/properties/OnRun from TH-5.0 SP1: ...PurchHeader.TESTFIELD(Status,PurchHeader.Status::Open);Item.GET("No."); IF Item."Kit BOM No." = '' THEN ERROR(Text001, "No.");KitManagement.GetKitProdBOM(...); IF NoOfBOMComp = 0 THEN ERROR(Text001, "No."); Selection := STRMENU(Text005,2); ... Aha! A smart way to achieve this Thus, updating the IF should have low cost. Hmm… no. These lines replaced theseThus, updating the IF should have high cost.
Low costs for updates • codeunit 73/properties/OnRun from W1-5.0 SP1: REPEATToPurchLine.INIT;NextLineNo := NextLineNo + LineSpacing;ToPurchLine."Line No." := NextLineNo; CASE FromBOMComp.Type OFFromBOMComp.Type::" ":ToPurchLine.Type := ToPurchLine.Type::" ";FromBOMComp.Type::Item: ... • codeunit 73/properties/OnRun from TH-5.0 SP1: REPEATToPurchLine.INIT;NextLineNo := NextLineNo + LineSpacing;ToPurchLine."Line No." := NextLineNo; CASE TempProdBOMLine.Type OFTempProdBOMLine.Type::" ":ToPurchLine.Type := ToPurchLine.Type::" ";TempProdBOMLine.Type::Item: ...
Which edit costsgives “best” edit scripts? • Ideally, we would requireupdate(L1, L2) < delete(L1) + add(L2)while still taking the content into account. (For example, updating an IF to a WHILE should have a very high cost.)