170 likes | 310 Views
Authors: Philipp Digeser , Marco Tubolino , Martin Klemm , Daniel Shapiro, Axel Sikora and Miodrag Bolic Email: { digeserp , tubolinm , klemmm , sikora }@dhbw-loerrach.de {dshap092, mbolic }@ site.uottawa.ca.
E N D
Authors: Philipp Digeser, Marco Tubolino , Martin Klemm, Daniel Shapiro, Axel Sikora and MiodragBolic Email: {digeserp, tubolinm, klemmm, sikora}@dhbw-loerrach.de {dshap092, mbolic}@site.uottawa.ca Instruction Set Extensions for Computation on Complex Floating Point Numbers
Overview • Prior Art • Complex Floating Point Division • Instruction Set Extensions (ISE) • Instruction Hardware • Software Interface • Experiment • Performance Evaluation • Hardware Resource Utilization • Future Work • Conclusion
Prior Art • We described the possibility of accelerating scientific observation using ISEs instead of software libraries such as carith • In this work we demonstrated this possibility • The extension of our prior work can perform several operations (complex addition/subtraction/multiplication/division) which improves the chances of our ISE being widely applicable.
Complex Floating Point Computations • Unlike real multiplication or division, mathematical operations for complex numbers are usually provided by slow software. Consider complex division: Slow • 3 Additions/Subtractions • 6 Multiplications • 2 Divisions
Complex Floating Point Computations • Fast complex computations are necessary • Image and audio manipulation • Multi-antenna • Correlation • Others • Example: STSDAS offers math libraries for image analysis, including stsdas.analysis.fourier.carith, which is used to multiply or divide two complex images [1].
Instruction Set Extension • Instruction-Set Extensions, as the name implies, involves the addition of custom instructions to a processor’s instruction set Generic custom instruction datapath [2]
Instruction Set Extension • An ISE candidate has limited I/O access to the register file. • We use multicycle reads/writes from/to the register bank in order to squeeze several operands into the two input-one-output register file [4] • The computations can be distributed to one adder, one multiplier and one divider • They can be pipelined • In case of divide by zero and overflow flags are set Original custom logic block [3]
Instruction Hardware Operation when n=0 above, n=1 at right.
Software Interface • The designed hardware for complex division can be used easily in assembly (by inline) or C/C++ code as shown below: ALT_CI_COMPLEX_CORE_INST(0, in_A, in_C); out_real = ALT_CI_COMPLEX_CORE_INST(1, in_B, in_D); out_imag = ALT_CI_COMPLEX_CORE_INST(0, 0, 0);
Experiment • h(u,v) is some blurred picture taken by a telescope • Motion blurring: long exposure time and moving of the camera. E.g. hubble • g(u,v) illustrates the image aimed to be recovered • f(u,v) the failure, called a point spread function, can be calculated out of the known movement of the target h(u,v) g(u,v) f(u,v)
Experiment • To restore the image, they must be transformed into the freq. domain by applying an FFT and back using IFFT • This transformation leads to complex arrays in the freq. domain that need to be divided: f(u,v) ∗g(u,v)=h(u,v) G(u,v)=H(u,v)/F(u,v) h(u,v) g(u,v) f(u,v)
Performance Evaluation • Size: 256x256 Pixel
Hardware Resource Utilization • Considerable • The entire system requires 8864 Logic Elements and 27 9-Bit DSP units • The complex core requires 2520 Logic Elements and 23 9-Bit DSP units • Optimizing the ISE hardware to maximize reuse was essential to limiting the hardware size
Future Work • Adding FFT and IFFT • Toaccelerateotherembeddedcomplexmathematicsalgorithms • Correlationofpictures • Insteadofdoing a slow time domaincorrelation • Heavy complexmultiplication in freq. domain
Conclusion • The designed ISE can be used to accelerate embedded complex mathematics operations • Significant Speedup (up to 12)
References [1] Space Telescope Science Institute. (2010) carith. [Online]. Available: http://stsdas.stsci.edu/cgi-bin/gethelp.cgi?carith.hlp [2] ALTERA Corperation. (2007) Nios II custom instruction user guide. [Online]. Available: http://www.altera.com/literature/tt/tt nios2 multiprocessor tutorial.pdf [3] P. Digeser, M. Tubolino, M. Klemm, D. Shapiro, and M. Bolic, “Instruction set extension in the NIOS II: A floating point divider for complex numbers,” in CCECE, 2010. [4] L. Pozzi and P. Ienne, “Exploiting pipelining to relax register-file port constraints of instruction-set extensions,” in CASES ’05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems. New York, NY, USA: ACM, 2005, pp. 2–10.