1 / 10

Introdution to SSE or How to put your algorithms on steroids!

Introdution to SSE or How to put your algorithms on steroids!. Christian Kerl. Outline. What is SSE? Basic Operations Example: Image Pyramid Summary Further Resources. What is SSE?. SSE = S treaming S IMD E xtensions SIMD = S ingle I nstruction, M ultiple D ata

jaegar
Download Presentation

Introdution to SSE or How to put your algorithms on steroids!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introdution to SSEorHow to put your algorithms on steroids! Christian Kerl

  2. Outline • What is SSE? • Basic Operations • Example: Image Pyramid • Summary • Further Resources

  3. What is SSE? • SSE = Streaming SIMD Extensions • SIMD = Single Instruction, Multiple Data • Developed by Intel in 1999 • Further extensions SSE2, SSE3 (SSSE3, SSE4) • Allows parallel processing of multiple integer or floating point values

  4. What is SSE? • 8 XMM registers (special CPU registers, 16 on 64 bit) • Each XMM register is 128 bits wide • 2 int64 / doubles • 4 int32 / floats • 8 int16 • 16 int8

  5. What is SSE? • Special instructions working on XMM registers • SSE 70, SSE2 144, SSE3 13 instructions • Different instructions for each data type • Usable in • Assembly • C/C++ through SSE “intrinsics”

  6. Basic Operations • Load / Store • Arithmetic • Comparison / Logical • Type conversion • …

  7. Basic Operations • Requirements on memory layout for loading and storing data • Memory addresses (pointers) need to be 16 byte aligned!

  8. Example: Image Pyramid • Performance on 2560x1920 image • Standard C++ version: 7.4 ms • SSE optimized version: 1.62 ms => ≈ 4.5x speedup

  9. Summary • SSE available on all modern x86 CPUs • Good for sequential data processing • Provides considerable speedups (2-4x) • SSE intrinsic code harder to program and read => Use wrapper library, e.g. EasySSE, ut-sse • Need to evaluate / extend / write one

  10. Further Resources • Tutorials: • http://supercomputingblog.com/optimization/getting-started-with-sse-programming/ • http://www.codeproject.com/Articles/4522/Introduction-to-SSE-Programming • http://sci.tuomastonteri.fi/programming/sse • MSDN: good reference manual for intrinsics • http://msdn.microsoft.com/de-de/library/y0dh78ez • Wrapper Libraries: • http://sourceforge.net/projects/easysse/ • http://code.google.com/p/ut-sse/

More Related