170 likes | 340 Views
Scientific Computing Beyond Matlab. Nov 19, 2012 Jason Su. Motivation. I’m interested in (re-)coding a general solver for sc / mcDESPOT relaxometry mapping Open source Extensibility to new/ add’l sequences with better sensitivity to certain parameters, e.g. B0 and MWF
E N D
Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su
Motivation • I’m interested in (re-)coding a general solver for sc/mcDESPOTrelaxometry mapping • Open source • Extensibility to new/add’lsequences with better sensitivity to certain parameters, e.g. B0 and MWF • Better parallelization • But: • Large-scale code development in Matlab is cumbersome • Matlab is slow • C is hard (to write, read, debug) • Creates large barrier for others to contribute
Matlab Pros Cons Requires license, not free (though there is Octave) Vectorized code is often non-intuitive to write and hard to read Slow for general computations Limited parallel computing and GPU support • Ubiquitous, code is cross-platform • Can be fast with vectorized code • Data visualization • Quick development time • Great IDE for general research • Poor for large projects • Many useful native libraries/toolboxes • Built-in profiling tools
C/C++ Pros Cons High learning curve and development time No data visualization Compiled code is platform specific Compiler is not generally installed with OSX and Windows • Fast • Great IDEs for large coding projects • Not as great for general science work • Strong parallel computer support and CUDA • Community libraries for scientific computing • Profiling dependent on IDE
Python Pros Cons Slow for general computation Mixed bag of IDEs, some are great for coding, others for research Out of the box it’s a poor alternative: no linear algebra or data visualization • Preinstalled with OSX and Linux-based systems • Readability is a core tenet (“pythonic”) • Quick development time • Native parallel computing support and community GPU modules • Extensive community support • Including neuroimaging-specific: NiPype, NiBabel • Built-in profiling module and some IDE tools
Python & Friends Cons Solutions Cython, JIT compilers like PyPy There are a few good options out there that I’ve found: Eclipse + PyDev, NetBeanz Spyder – closest to MATLAB Sage Math Notebook, IPython – like Mathematica It may come down to preference. NumPy + SciPy + Matplotlib = PyLab Sage Math includes these as well as other capabilities like symbolic math and graph theory • Slow for general computation • Mixed bag of IDEs, some are great for coding, others for research • Out of the box it’s a poor alternative: no linear algebra or data visualization
Pythonic? • A term of praise used by the community to refer to clean code that is readable, intuitive, explicit, and takes advantage of coding idioms • Python people = [‘John Doe’, ’Jane Doe’, ’John Smith’] smith_family = [] for name in people: if ‘Smith’ in name: smith_family.append(name) smith_family = [name for name in people if ‘Smith’ in name] • Matlab people = {‘John Doe’, ’Jane Doe’, ’John Smith’}; smith_family = {} for name = people if strfind(name{1},’Smith’) smith_family = [smith_family name]; end end
Installation • On any OS: • Sage Math (http://www.sagemath.org/), easy unzip installation but many “extraneous” packages (500MB) • Some issues on OSX with matplotlib • On OSX: • Use MacPorts to install Python (2.7), SciPy, matplotlib, and Cython • Requires gcc compiler available through Apple Developer
NumPy + SciPyvsMatlab • Same core libraries: LAPACK • Equivalent syntax but not trying to be similar • http://www.scipy.org/ NumPy_for_Matlab_Users • Key differences: • Python uses 0 (zero) based indexing. The initial element of a sequence is found using [0]. • In NumPy arrays have pass-by-reference semantics. Slice operations are views into an array.
Syntax Matlab NumPy linalg.lstsq(a,b) a.max() a[-5:] arange(10.) or r_[:10.] • a\b • max(a(:)) • a(end-4:end) • [0:9]
Cython • Requires a C compiler • Cython is Python with C data types. • Dynamic typing of Python has overhead, slow for computation • Allows seamless coding of Python and embedded C-speed routines • Python values and C values can be freely intermixed, with conversions occurring automatically wherever possible • This means for debugging C-level code, we can use all the plotting tools available in Python • Process is sort of like EPIC • Write a .pyx source file • Run the Cython compiler to generate a C file • Run a C compiler to generate a compiled library • Run the Python interpreter and ask it to import the module
Code Comparison – Matlab • Let’s try a really basic speed comparison test s = 0 tic for i = 1:1e8 s = s + i; end toc tic x = 1:1e8; sum(x) toc
Code Comparison – C #include <time.h> #include <stdio.h> int main() { long long unsigned int sum = 0; long long unsigned inti = 0; long long unsigned int max = 100000000; clock_t tic = clock(); for (i = 0; i <= max; i++) { sum = sum + i; } clock_ttoc = clock(); printf("%15lld, Elapsed: %f seconds\n", sum, (double)(toc - tic) / CLOCKS_PER_SEC); return 0; }
Code Comparison – Python import time from numpy import * s = 0 t = time.time() for i in xrange(100000001): s += i print time.time() - t t = time.time() x = arange(100000001) sum(x) print time.time() - t
Code Comparison – Cython • addCy.pyx import time cdeflong longint n = 100000000 cdef long longint s = 0 cdef long longinti = 0 t = time.time() for i in xrange(n+1): s += i print time.time() – t • runCy.py import pyximport; pyximport.install() import addCy
Summary • Python • Full featured programming language with an emphasis on “pythonic” readability • NumPy/SciPy • Core libraries for linear algebra and computation (fft, optimization) • Cython • Allows as much optimization as you want, degrading gracefully from high-level Python to low-level C • Profile, don’t over optimize too early!