200 likes | 384 Views
Dynamic Parallelization of JavaScript Applications Using an Ultra-lightweight Speculation Mechanism. Mojtaba Mehrara Po-Chun Hsu Mehrzad Samadi Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan. Web 2.0 is Here. More computation is moved to client-side
E N D
Dynamic Parallelization of JavaScript Applications Using an Ultra-lightweight Speculation Mechanism MojtabaMehrara Po-Chun Hsu MehrzadSamadi Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan
Web 2.0 is Here • More computation is moved to client-side • More responsive browsing • Avoid unnecessary network traffic Server-side computation JavaScript + DHTML Web 2.0 Web 1.0 Static HTML Client-side computation Client-side rendering [Vikram et al., CCS’09] 2
Client-side Computation in JavaScript • Flexibility, ease of prototyping, and portability • Poor performance is one of the main challenges 3
Client-side Applications • Interaction-intensive: • Largely composed of event handlers, triggered by user • Examples are Gmail, Facebook, etc. • Compute-intensive: • Dominated by loops and hot functions • Online image editing such as Adobe’s Photoshop.com, Google’sPicnik • Lot more potential: • Online games • Video editing • Sound editing and voice recognition 4
Improving JavaScript Performance • Many efforts underway by browser developers to improve JavaScript sequential performance • Hits a performance wall as multi-cores are becoming dominant • Parallelism must be exploited to make use of multi-core clients • JavaScript is inherently sequential • Language and run-time system provide little/no concurrency support Our proposal: Low-cost dynamic & speculative parallelization of JavaScript applications 5
JavaScript Parallelization • A typical static parallelization flow • Dynamic parallelization $ Memory dependence analysis $ Source code Parallel code generation Memory profiling Data flow analysis Compile time $ Speculation engine (Software transactional memory) Parallel execution Runtime Runtime 6
Our Approach: ParaScript • Light-weight dynamic analysis & code generation for speculative DOALL loops • Low-cost customized SW speculation with a single checkpoint Finish Parallel execution Parallel Code generation Hot loop detection Initial parallelizability assessment Customized speculation Abort Loop Selection Sequential execution Runtime 7
Dependence Analysis • JIT compilation time data flow analysis Runtime initial tests + range-based monitoring Runtime reference-counting-based monitoring 8
Scalar Array Conflict Detection • Initial assessment catches trivial conflicts • Keep track of max and min accessed element indices • Cross-check RD/WR sets after thread execution Thread 1 Thread 2 A[0] = … • A[5] = … B[7] = … • A[6] = A[5]+1 Array read-set Array write-set Array write-set ptr min max ptr min max ptr min max &A 5 5 &A 6 6 &A 5 0 &B 7 7 9
Object Array Conflict Detection • More involved than scalar arrays • Different indices of the same array may point to the same object If dependent based on data-flow analysis A ptrRefCnt ptrRefCnt 2 1 1 &A &A header header myObj0 myObj1 &B 1 B 10
Loop Selection • Focus on DOALL-counted (e.g. for loops) • Avoid parallelizing loops with: • Browser interactions • HTTP request functions • Runtime code insertion Requires locks on browser internals Requires server-side speculation • varaddFunction = • new Function("a" , "b“, • "return a+b;"); Function addFunction(a, b){ return a+b; } a = 7; b = 13; document.write(a+b); eval("a = 7; b = 13; document.write(a+b);"); 11
Checkpointing Mechanism • Go through all references, clone them, and ask GC not to touch the clones • Monitor overhead, back out if more expensive than a threshold 12
Checkpointing Optimizations • Selective variable cloning • Only clone a variable if it is touched during speculative execution • Array clone elimination • Large arrays holding results of browser functions • Instead of cloning the array, just call the function again for recovery • E.g. getImageData in the canvas HTML5 element 13
Parallel Code Generation -Take checkpoint -Spawn threads IE = min(IS+CS*SS,n); Parallel Loop for (i=IS;i<IE;i+=SS) // original loop code conflictcheck(); chunkbarrier() IS+=CS * TC * SS; Loop Barrier Reduction variable & conditional live-out aggregation 14
Experimental Setup • Implemented in Firefox 3.7a1pre • Subset of SunSpider benchmark suite • Others identified as not parallelizable early on, causing 2% slow-down due to the initial analysis. • A set of Pixastic Image Processing filters • 8-processor system -- 2 Intel Xeon Quad-cores, running Ubuntu 9.10 • Ran each benchmark 10 times and took the average 15
Parallelism Coverage High fraction of sequential execution in the getImageData() browser function DOALL loop that extracts pixel RGB & alpha values 16
SunSpider A long iteration dominates execution 17
8 threads 4 threads Pixastic Image Processing 2 threads High memory op to computation ratio 1 thread 18
Conclusion • Web applications dominance is pushing JavaScript to the forefront of computing • Dynamic environment and performance constraints makes parallelization challenging • We introduce efficient solutions for exploiting parallelism • 17% speculation overhead across all benchmarks • ParaScript achieved an average of 2.55x and 1.82x speedup on 8 processors for SunSpider and Pixastic 19
Thank You! Questions? 20