800 likes | 976 Views
Hands-on Refactoring with Wrangler. Simon Thompson Huiqing Li, Xingdong Bian University of Kent. Overview. What is refactoring? Examples The process of refactoring Tool building and infrastructure What is in Wrangler … demo Latest advances: data, processes, erlide.
E N D
Hands-on Refactoring with Wrangler Simon Thompson Huiqing Li, Xingdong Bian University of Kent
Overview • What is refactoring? • Examples • The process of refactoring • Tool building and infrastructure • What is in Wrangler … demo • Latest advances: data, processes, erlide.
Soft-ware • There’s no single correct design … • … different options for different situations. • Maintain flexibility as the system evolves.
Refactoring • Refactoring means changing the design or structure of a program … without changing its behaviour. Modify Refactor
-module (test). -export([f/1]). add_one ([H|T]) -> [H+1 | add_one(T)]; add_one ([]) -> []. f(X) -> add_one(X). -module (test). -export([f/1]). add_one (N, [H|T]) -> [H+N | add_one(N,T)]; add_one (N,[]) -> []. f(X) -> add_one(1, X). Generalisation Generalisation and renaming -module (test). -export([f/1]). add_int (N, [H|T]) -> [H+N | add_int(N,T)]; add_int (N,[]) -> []. f(X) -> add_int(1, X).
-export([printList/1]). printList([H|T]) -> io:format("~p\n",[H]), printList(T); printList([]) -> true. printList([1,2,3]) -export([printList/2]). printList(F,[H|T]) -> F(H), printList(F, T); printList(F,[]) -> true. printList( fun(H) -> io:format("~p\n", [H]) end, [1,2,3]). Generalisation
-export([printList/1]). printList([H|T]) -> io:format("~p\n",[H]), printList(T); printList([]) -> true. -export([printList/1]). printList(F,[H|T]) -> F(H), printList(F, T); printList(F,[]) -> true. printList(L) -> printList( fun(H) -> io:format("~p\n", [H]) end, L). Generalisation
pid! {self(),msg} {Parent,msg} -> body pid! {self(),msg}, receive {pid, ok}-> ok {Parent,msg} -> Parent! {self(),ok}, body Asynchronous to synchronous
Transformation Ensure change at all those points needed. Ensure change at only those points needed. Condition Is the refactoring applicable? Will it preserve the semantics of the module? the program? Refactoring = Transformation + Condition
Transformations full stop one
Condition > Transformation • Renaming an identifier • "The existing binding structure should not be affected. No binding for the new name may intervene between the binding of the old name and any of its uses, since the renamed identifier would be captured by the renaming. Conversely, the binding to be renamed must not intervene between bindings and uses of the new name."
Which refactoring exactly? • Generalise f by making 23 a parameter of f: • f(X) -> • Con = 23, • g(X) + Con + 23. • This one occurrence? • All occurrences (in the body)? • Some of the occurrences … to be selected.
-export([oldFun/1, newFun/1]). oldFun(L) -> newFun(L). newFun(L) -> … … . -export([newFun/1]). newFun(L) -> … … . Compensate or crash? or ?
Tool support • Bureaucratic and diffuse. • Tedious and error prone. • Semantics: scopes, types, modules, … • Undo/redo • Enhanced creativity
Semantic analysis • Binding structure • Dynamic atom creation, multiple binding occurrences, pattern semantics etc. • Module structure and projects • No explicit projects for Erlang; cf Erlide / Emacs. • Type and effect information • Need effect information for e.g. generalisation.
Erlang refactoring: challenges • Multiple binding occurrences of variables. • Indirect function call or function spawn: apply (lists, rev, [[a,b,c]]) • Multiple arities … multiple functions: rev/1 • Concurrency • Refactoring within a design library: OTP. • Side-effects.
Static vsdynamic • Aim to check conditions statically. • Static analysis tools possible … but some aspects intractable: e.g. dynamically manufactured atoms. • Conservative vsliberal. • Compensation?
Refactorings in Wrangler Renaming variable, function, module, process Function generalisation Move function between modules. Function extraction Fold against definition Introduce and fold against macros. Tuple function arguments together Register a process From function to process Add a tag to messages All these refactorings work across multiple-module projects and respect macro definitions.
Lightweight. Better integration with interactive tools (e.g. emacs). Undo/redo external? Ease of implementing conditions. Higher entry cost. Better for a series of refactorings on a large project. Transaction support. Ease of implementing transformations. Wrangler and RefactorErl
Duplicate Code Detection Especially for Erlang/OTP programs. Report syntactically well-formed code fragments that are identical after consistent renaming of variables … … ignoring differences in literals and layout. Integrated with the refactoring environment.
Code Inspection Support • Variable use/binding information. • Caller functions. • Caller/callee modules. • Case/if/receive expressions nested more than a specified level. • Long function/modules. • Non tail-recursive servers. • Non-flushed unknown messages • . . .
Integration … with IDEs • Back to the future? Programmers' preference for emacs and gvim … • … though some IDE interest: Eclipse, NetBeans … • Issue of integration with multiple IDEs: building common interfaces.
Integration … with tools • Test data sets and test generation. • Makefiles, etc. • Working with macros e.g. QuickCheck uses Erlang macros … • … in a particular idiom.
APIs … programmer / user • API in Erlang to support user-programmed refactorings: • declarative, straightforward and complete • but relatively low-level. • Higher-level combining forms? • OK for transformations, but need a separate condition language.
Verification and validation • Possible to write formal proofs of correctness: • check conditions and transformations • different levels of abstraction • possibly-name binding substitution for renaming etc. • more abstract formulation for e.g. data type changes. • Use of Quivq QuickCheck to verify refactorings in Wrangler.
The Wrangler Clone Detector • Uses syntactic and static semantic information. • Syntactically well-formed code fragments • … identical after consistent renaming of variables, • … with variations in literals, layout and comments. • Integrated within the refactoring environment.
The Wrangler Clone Detector • Make use of token stream and annotated AST. • Token–based approaches • Efficient. • Report non-syntactic clones. • AST-based approaches. • Report syntactic clones. • Checking for consistent renaming is easier.
The Wrangler Clone Detector Source Files Tokenisation Token Stream Normalisation Normalised Token Stream Suffix Tree Construction Suffix tree
The Wrangler Clone Detector Source Files Tokenisation Parsing + Static Analysis Token Stream Annotated ASTs Syntactic Clones Normalisation Clone Decomposition Filtered Initial Clones Normalised Token Stream Suffix Tree Construction Clone Filter Suffix tree Initial Clones Clone Collector
The Wrangler Clone Detector Source Files Tokenisation Parsing + Static Analysis Token Stream Annotated ASTs Syntactic Clones Consistent Renaming Checking Normalisation Clone Decomposition Filtered Initial Clones Normalised Token Stream Clones to report Suffix Tree Construction Clone Filter Suffix tree Initial Clones Clone Collector
The Wrangler Clone Detector Source Files Tokenisation Parsing + Static Analysis Token Stream Annotated ASTs Syntactic Clones Consistent Renaming Checking Normalisation Clone Decomposition Filtered Initial Clones Normalised Token Stream Clones to report Suffix Tree Construction Clone Filter Formatting Suffix tree Initial Clones Clone Collector Reported Code Clones
Support for clone removal • Refactorings to support clone removal. • Function extraction. • Generalise a function definition. • Fold against a function definition. • Move a function between modules.
Case studies • Applied the clone detector to Wrangler itself with threshold values of 30 and 2. • 36 final clone classes were reported …12 are across modules, and 3 are duplicated function definitions. • Without syntactic checking and consistent variable renaming checking, 191 would have been reported. • Applied to third party code base (32k loc, 89 modules),109 clone classes reported.
-module(tup1). -export([gcd/2]). gcd(X,Y) -> if X>Y -> gcd(X-Y,Y); Y>X -> gcd(Y-X,X); true -> X end. Tupling parameters -module(tup1). -export([gcd/1]). gcd({X,Y}) -> if X>Y -> gcd({X-Y,Y}); Y>X -> gcd({Y-X,X}); true -> X end. 2
-module(rec1). g({A, B})-> A + B. h(X, Y)-> g({X, X}), g(Y). Introduce records … -module(rec1). -record(rec,{f1, f2}). g(#rec{f1=A, f2=B})-> A + B. h(X, Y)-> g(#rec{f1=X,f2=X}), g(#rec{ f1=element(1,Y), f2=element(2,Y)}). f1 f2