Scalab: a Build Tool for Scala

Scalab: a Build Tool for Scala Master Thesis July 4, 2008 Author: Vincent Pazeller Supervisor: Gilles Dubochet Professor: Martin Odersky Programming Methods Laboratory / LAMP

Outline • Interest of Build Tools • Interest of Scalab • Definition of a Build Tool • Model • Caches • Internal Operation (update) • Sabbus • Further Work

Build Tools Interest • Build Process: sequence of tasks that transform the sources of a project into its executable equivalent. • All tasks do not always need to be executed. • Sources may not have all been modified.  A build tool automates the choice of tasks to be executed and optimizes the build process.  Increases developers’ productivity.

Interest of Scalab • Existing tools make non-conservative approximations. • Makes it possible to describe situations that cannot be described with any other build system. • They are also too difficult to use: • Sabbus written with Ant  1200 lines of XML. • < 100 (reasonable) lines of Scala code with Scalab.

Task • A task employs sources to produce products. • Sources and products are resources (i.e. files). • The universe is the set of all resources.

Up-to-date • Given a set of tasks , a task  is up-to-date with respect to  when the products of  cannot be modified by the execution of any sequence  of tasks, ∀i.i∈ ∧ i∈   The purpose of a build tool is to make  up-to-date ∀ ∈   The build tool needs to know (at run-time) the sources and the products of each task.

Model: universe Static representation of a common Java project: Note: all resources depend directly on the universe. Idea: make them depend on the task that created them instead (more dynamic):

Model: filters • Idea: indicate in the graph how resources can be extracted from the universe. • The build tool can detect changes dynamically. • Filters are sub-divided in three categories: • Selectors • Scanners • Mappers • Filters can also be used to filter the products of components (tasks and filters, so far).

Model: Pipes • We can now simplify the graph: Becomes  Arrows are called pipes and carry resources from component to component.

Model: Gates • Inputs • Interest: distinguish subsets of sources. • Mandatory (■) or optional (□). • Output: each component has a single output which is implicit.

Model: Black Boxes • Interest: • hide and/or make sub-graph re-usable/distributable. • Reduces the risk of errors • The inside looks like: • The output must be explicitly provisioned, this time.  Black boxes behave exactly as if they were tasks.

Model: Build Schemes • Generalization of black boxes. • This scheme can then be used with any compiler and any archiver. • The class-path input has been omitted because it cannot be generalized to all compilers. • Build schemes are black-box generators.

Model: Targets • Purpose: indicate clearly which components are relevant to build. • Interests: • Build process is more explicit. • Avoid any misuse.

Model: Dependencies • Hard dependencies: • if a source hard dependency of a task does not exist, the task cannot be executed. • A product hard dependency indicates that the resource has been created (no doubt). • Soft dependencies: • The absence of a source soft dependency does not prevent the task to execute. • A product soft dependency suggests a doubt on the resource creation.  Pipes can form cycles in the build graph! • Filters are not affected by (hard) dependencies.

Caches • Interest: avoid that tasks repeat the work they have done in the past. • Principle: load/store resources from/to repository. • Tasks write directly in the cache repository (no copy).

Caches: Behavior • The behavior of a cache is defined by: • Change Detection Policy: used to detect when a source has changed. • Eviction Policy: used to select and delete the least pertinent information in a cache. • Core Policy: Defines how the cache loads and stores information and coordinates the three policies.

Scalac Caller .class Caller .scala Callee .scala Caller$ .class Callee$ .class Callee .class Caches: Conservativeness • Caches can be conservative or not • Conservative caches ensure that the result of cached tasks is always sound.  Conservative caches need to know inter-dependencies among resources. Caller.scala: object Caller{ def main(args: Array[String]){ Callee.invoke } } Callee.scala: object Callee{ def invoke{} }

Update • First try: • Wrong! If the graph contains a cycle, the algorithm will never terminate! traitExecutableComponent{ … protecteddef update: Boolean = this.inputs forall {i => i.providers forall {p => p.update} } && this.exec … }

Update: Cycle Detection protected def update(visited: Set[Component], cycles: Set[(Output, Input)]): (Boolean, Set[Component], Set[(Output, Input)]) = { var newVisited = visited + this //add this node to the visited set var newCycles = cycles val inputsUpdated = this.inputs forall {i => i.providers forall {p => if(newCycles contains Pair(p.output, i)) //ensures termination true else{ if(visited contains p) //cycle detection newCycles = newCycles + Pair(p.output, i) val (updated, moreVisited, moreCycles) = p.update(newVisited, newCycles) //update providers newVisited = newVisited ++ moreVisited newCycles = newCycles ++ moreCycles updated } } //input providers are up-to-date } //inputs are up-to-date (inputsUpdated && this.exec, newVisited, newCycles) //update this component }

Update: Redundancy • Presented update algorithm is not optimal:  Need to remember which components were updated. in0 in1

Update: Efficient Version protected def update(visited: Set[Component], cycles: Set[(Output, Input)], updated: Set[Component]): (Boolean, Set[Component], Set[(Output, Input)], Set[Component]) = { if(updated contains this) //avoid redundant updates (true, visited, cycles, updated) else{ var newVisited = visited + this //add this node to the visited set var newCycles = cycles var newUpdated = updated val inputsUpdated = this.inputs forall {i => i.providers forall {p => if(newCycles contains Pair(p.output, i)) //ensures termination true else{ if(visited contains p) //cycle detection newCycles = newCycles + Pair(p.output, i) val (updated, moreVisited, moreCycles, moreUpdated) = p.update(newVisited, newCycles, newUpdated) newVisited = newVisited ++ moreVisited newCycles = newCycles ++ moreCycles newUpdated = newUpdated ++ moreUpdated updated } } //input providers are up-to-date } //inputs are up-to-date (inputsUpdated && this.exec, newVisited, newCycles, newUpdated + this) //update this component } }

Sabbus

Sabbus: Preamble 1 val scalaHome = “/home/pazeller/pdm/scala/” 2 val scalaSrcs = scalaHome + “srcs/” 3 val scalacSrcs = scalaSrcs + “compiler/” 4 val scalaLib = scalaHome + “lib/” //sources 5 val compilerSrcs = Universe -> Files(scalacSrcs) -> ListDirs() -> EndsWith(“.scala”) -> 6 StartsWith(scalacSrcs + “scala/tools/ant/”).complement 7 val libSrcs = Universe -> Files(scalaSrcs + “library/”) -> ListDirs() -> EndsWith(“.scala”) //old classes 8 val oldLib = Files(scalaLib + “scala-library.jar”) 9 val bytecodeGen = Files(scalaLib + “fjbg.jar”) // bytecode generator 10 val oldScalac = Files(scalaLib + “scala-compiler.jar”) 11 Universe >> (oldLib, oldScalac, bytecodeGen)

Sabbus: Instantiating Compilers //Building compilers 12 val Starr = DynamicScalac(“starr”) 13val StarrLib = DynamicScalac(“starrLib”) 14val locker = DynamicScalac(“locker”) 15val lockerLib = DynamicScalac(“lockerLib”) 16val quick = DynamicScalac(“quick”) 17val quickLib = DynamicScalac(“quickLib”) //grouping 18val compilers = List(starr, locker, quick) 19val libCompilers = List(starrLib, lockerLib, quickLib) 20val allCompilers = compilers ::: libCompilers

Sabbus: Connecting Pipes //sources 21 newLibSrcs >> (libCompilers map {c => c.src}) 22 newCompilerSrcs >> (compilers map {c => c.src}) //loading classes 23 bytecodeGen >> (allCompilers map {c => c.load}) 24 oldCompiler >> (starrLib.load, starr.load) 25 starr.runDirectory >> (lockerLib.load, locker.load) 26 locker.runDirectory >> (quickLib.load, quick.load) //libraries 27 oldLib >> (starrLib.load, starr.load) 28 starrLib.runDirectory >> (starr.boot, lockerLib.load, locker.load) 29 lockerLib.runDirectory >> (locker.boot, quickLib.load, quick.load) 30 quickLib.runDirectory -> quick.boot

Sabbus: Concluding 31 Stopwatch(allCompilers) //timing compilers executions 32 val stability = SameContent(“stability”, locker, quick) //stability check 33 val distr = Jar(“jar”, “scala-compiler.jar”, stability) //setting cache 34 val cache = new ConservativeCCP with TimestampCDP with LRUCEP 35 cache.setCacheDirectory(“/home/pazeller/shared/cache/”) 36 allCompilers foreach {c => c.setCache(cache)} //targets 37 val buildDir = “./build” 38 val newDistribution = Target(buildDir + “distr/”, distr) 39 val newLibrary = Target(buildDir + “library/”, lockerLib) 40 val newCompiler = Target(buildDir + “compiler/”, locker) 41 val default = newCompiler //default target

Further Work • Parallel Task Execution • Graphical Interface • Interactive Debug Mode • Filters Caching • Automatic Graph Dismantling • Extending Library

Thank you for your attention. • Project can be consulted on http://scalab.googlecode.com • Feel free to ask questions.

Scalab: a Build Tool for Scala