Instruction Set Extensions for Computation on Complex Floating Point Numbers

Authors: Philipp Digeser, Marco Tubolino , Martin Klemm, Daniel Shapiro, Axel Sikora and MiodragBolic Email: {digeserp, tubolinm, klemmm, sikora}@dhbw-loerrach.de {dshap092, mbolic}@site.uottawa.ca Instruction Set Extensions for Computation on Complex Floating Point Numbers

Overview • Prior Art • Complex Floating Point Division • Instruction Set Extensions (ISE) • Instruction Hardware • Software Interface • Experiment • Performance Evaluation • Hardware Resource Utilization • Future Work • Conclusion

Prior Art • We described the possibility of accelerating scientific observation using ISEs instead of software libraries such as carith • In this work we demonstrated this possibility • The extension of our prior work can perform several operations (complex addition/subtraction/multiplication/division) which improves the chances of our ISE being widely applicable.

Complex Floating Point Computations • Unlike real multiplication or division, mathematical operations for complex numbers are usually provided by slow software. Consider complex division: Slow • 3 Additions/Subtractions • 6 Multiplications • 2 Divisions

Complex Floating Point Computations • Fast complex computations are necessary • Image and audio manipulation • Multi-antenna • Correlation • Others • Example: STSDAS offers math libraries for image analysis, including stsdas.analysis.fourier.carith, which is used to multiply or divide two complex images [1].

Instruction Set Extension • Instruction-Set Extensions, as the name implies, involves the addition of custom instructions to a processor’s instruction set Generic custom instruction datapath [2]

Instruction Set Extension • An ISE candidate has limited I/O access to the register file. • We use multicycle reads/writes from/to the register bank in order to squeeze several operands into the two input-one-output register file [4] • The computations can be distributed to one adder, one multiplier and one divider • They can be pipelined • In case of divide by zero and overflow flags are set Original custom logic block [3]

Instruction Hardware Operation when n=0 above, n=1 at right.

Software Interface • The designed hardware for complex division can be used easily in assembly (by inline) or C/C++ code as shown below: ALT_CI_COMPLEX_CORE_INST(0, in_A, in_C); out_real = ALT_CI_COMPLEX_CORE_INST(1, in_B, in_D); out_imag = ALT_CI_COMPLEX_CORE_INST(0, 0, 0);

Experiment • h(u,v) is some blurred picture taken by a telescope • Motion blurring: long exposure time and moving of the camera. E.g. hubble • g(u,v) illustrates the image aimed to be recovered • f(u,v) the failure, called a point spread function, can be calculated out of the known movement of the target h(u,v) g(u,v) f(u,v)

Experiment • To restore the image, they must be transformed into the freq. domain by applying an FFT and back using IFFT • This transformation leads to complex arrays in the freq. domain that need to be divided: f(u,v) ∗g(u,v)=h(u,v) G(u,v)=H(u,v)/F(u,v) h(u,v) g(u,v) f(u,v)

Performance Evaluation • Size: 256x256 Pixel

Hardware Resource Utilization • Considerable • The entire system requires 8864 Logic Elements and 27 9-Bit DSP units • The complex core requires 2520 Logic Elements and 23 9-Bit DSP units • Optimizing the ISE hardware to maximize reuse was essential to limiting the hardware size

Future Work • Adding FFT and IFFT • Toaccelerateotherembeddedcomplexmathematicsalgorithms • Correlationofpictures • Insteadofdoing a slow time domaincorrelation • Heavy complexmultiplication in freq. domain

Conclusion • The designed ISE can be used to accelerate embedded complex mathematics operations • Significant Speedup (up to 12)

Questions?

References [1] Space Telescope Science Institute. (2010) carith. [Online]. Available: http://stsdas.stsci.edu/cgi-bin/gethelp.cgi?carith.hlp [2] ALTERA Corperation. (2007) Nios II custom instruction user guide. [Online]. Available: http://www.altera.com/literature/tt/tt nios2 multiprocessor tutorial.pdf [3] P. Digeser, M. Tubolino, M. Klemm, D. Shapiro, and M. Bolic, “Instruction set extension in the NIOS II: A floating point divider for complex numbers,” in CCECE, 2010. [4] L. Pozzi and P. Ienne, “Exploiting pipelining to relax register-file port constraints of instruction-set extensions,” in CASES ’05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems. New York, NY, USA: ACM, 2005, pp. 2–10.

Instruction Set Extensions for Computation on Complex Floating Point Numbers

Instruction Set Extensions for Computation on Complex Floating Point Numbers

Presentation Transcript

CHAPTER 5: Floating Point Numbers

Floating Point Numbers

Fixed-point and floating-point numbers

Set 16 FLOATING POINT ARITHMETIC

Floating point numbers in Python

IEEE Floating Point Numbers Overview

A Floating Point Divider for Complex Numbers in the NIOS II

Ch. 2 Floating Point Numbers

Set 16 FLOATING POINT ARITHMETIC

Floating Point Computation

4. Floating Point Numbers

What do floating-point numbers represent?

Floating Point Computation

Data Representation: Floating Point for Real Numbers

Floating Point Numbers

Chapter 3d: Floating-Point Numbers

Fixed and Floating Point Numbers

Floating Point Numbers

Programmable Logic Circuits: Floating-Point Numbers

Floating point numbers

More ALUs and floating point numbers

Floating Point Numbers