150 likes | 235 Views
Selective, Embedded Just-in-Time Specialization (SEJITS). As a platform for implementing communication-avoiding algorithms accessible from Python. Traditional deployment. C/C++ libraries LAPACK MKL Accessible from high-level languages like Python using C bindings
E N D
Selective, Embedded Just-in-Time Specialization (SEJITS) As a platform for implementing communication-avoiding algorithms accessible from Python
Traditional deployment • C/C++ libraries • LAPACK • MKL • Accessible from high-level languages like Python using C bindings • Most execution time spent in the library, so it’s fast
Problems: Static source code • C/C++ library has static source code • Cannot adapt to different architectures • Cannot adapt to different input data • Optimizations and functionality are mixed together • SEJITS solves these problems withruntime code generation
Problems: Composition • Example: Suppose we want to compute a complex matrix expression like • Library interface requires decomposing into a sequence of operations: • T1 = matrix_matrix_multiply(A, B) • t1 = matrix_vector_multiply(T1, x) • t2= matrix_vector_multiply(C, x) • result = dot_product(t1, t2) • Performance problem! Not the best sequence
Problems: Composition • Complex operations are formed by combining simpler ones • Application programmer interface consists of an awkward sequence of low-level operations • User doesn’t know how to choose best sequence • Library can’t see future operations • SEJITS solves these problems by providing a rich fluent interface and giving you access to the entire expression
SEJITS architecture • Applications are written in Python, a high-level productivity language • When certain functions are called, a specializer is invoked which compiles that function down to C/C++ and executes it on-the-fly • Specializers written in Python, supported by Asp infrastructure
SEJITS architecture Productivity app .py .c f() h() cc/ld $ PLL Interp ASP.py .so Specializer OS/HW
Implementation methods • Templates • Static C/C++ code with “holes” filled in at runtime for (inti=0; i < ${num_items}; i++) { arr[i] *= 2.0; } • Facilitates compiler optimizations • Allows adapting to machine parameters • Allows choosing among implementations based on architecture
Implementation methods • Tree transformations • Input/output code expressed as abstract syntax tree • Specializer walks over tree and translates nodes • Facilitates complex transformations and optimizations • Can be used together with templates
Akx specializer • Built by Jeffrey Morlan • Uses a communication-avoiding algorithm to compute Akx for many values of k • Building block in other algorithms like Conjugate Gradient • Generates different code depending on dimensions of the input matrices as well as their contents
Akx specializer Conjugate Gradient solver performance using communication-avoiding matrix powers kernel. A matrix labeled 141K/7.3M has 141K rows and 7.3M nonzero elements. The dark part of each bar shows time spent on matrix powers while the light part shows time in the remainder of the solver.
Live exercise • SSH to: moonflare.com • Log in as username “cs294-76”, password “2xyb3pex” • Do: • mkdiryourname • cp *.py *.makoyourname • cd yourname • Run with: python double.py • View generated C++ in “cache” subdirectory
Live exercise • Edit double_template.mako • Use your favorite editor or “nano” if you don’t have one • Try changing it to multiply the vector by 3.0 instead of 2.0 • Then run “python double.py” again • Don’t worry about assertion failure (sorry!)
Live exercise • Next we’ll make it so you can multiply by any scalar you want • Replace constant in double_template.mako with a placeholder ${scalar} • Edit double.py • Add a parameter to double_using_template for the scalar multiple • Pass it to mytemplate.render • Update test_generated to add the argument • Then run “python double.py” again
Download / Questions? • Download SEJITS at: • https://github.com/shoaibkamil/asp • Or just Google “SEJITS” • Contact parlab-sejits@lists.eecs for support • Questions?