1 / 15

Platform based design

Platform based design. Abstracting SIMD Hardware By: Joris Dobbelsteen Lennart de Graaf References: Liquid SIMD Abstracting SIMD Hardware using Lightweight Dynamic Mapping – Clark, Hormati a.o. Some issues w.r.t. SIMD. Issue 1:

sfreese
Download Presentation

Platform based design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Platform based design Abstracting SIMD Hardware By: Joris Dobbelsteen Lennart de Graaf References: Liquid SIMD Abstracting SIMD Hardware using Lightweight Dynamic Mapping – Clark, Hormati a.o.

  2. Some issues w.r.t. SIMD • Issue 1: • SIMD programming requires in depth knowlegde of the architecture • Issue 2: • SIMD is still evolving fast (especially in embedded systems) • Width changes (e.g. from 64 to 128 bit vectors) • Functionality increases (instruction set expands) • This leads to problems with regard to (forward) compatibility • Our objective: • Find possible solutions for these issues (in papers)

  3. Short web research • Several papers on SIMDization (issue 1): • Exploiting Vector Parallelism in Software pipelined Loops – • Larsen, Rabbah, Amarasinghe • Software pipelining is often used for ILP, however it does not use specific vector resources without explicit instructions. • For vectorization compilers generate separate loops for vectorizable and non vectorizable parts. This deminishes ILP. • By means of a 'selective vectorization' algorithm this is improved

  4. Short web research • Several papers on SIMDization (issue 1): • Compilation techniques for multimedia processors • Krall, Lelait • Scalar code is converted to SIMD code • loop unroling used to create the possibility to perform iterations in parallel • Does not work when iterations have data dependencies !

  5. Short web research • Compatibility (issue 2): • Abstracting SIMD Hardware using lightweight dynamic mapping • Clark, Hormati, Yehia, Mahlke, Flautner • This paper describes a way to abstract SIMD Hardware. It describes: • compiler/translation framework to realize Liquid SIMD • Vector width independent • a lightweight dynamic runtime SIMD code generator • This is the first step in solving the prior issues. • (n.b. this paper does not involve SIMDization) • We zoom in on this paper!

  6. Decoupling Principle • The paper describes a way to 'abstract' SIMD hardware with regard to specifically the second issue (compatibility) • Start off with SIMD instructions for a specific SIMD architecture • Then globally two steps are taken: • Mapping of SIMD instructions to equivalent scalar representation • Can take place at compiletime • Functionally SIMD and scalar are equivalent • Dynamic translation from scalar to SIMD instructions for (different) SIMD architecture • Takes place dynamically • In hardware • In software (just like JIT)

  7. Step 1: SIMD to scalar • The paper describes a table with rules to convert SIMD instructions to scalar instructions • Rule example: All elements are independent + + + + + • (relatively simple operation)

  8. Step 2: Scalar to SIMD • Based on the scalar instructions and information about the (other) SIMD architecture a conversion takes place the other way around + + + + + + • n.b. scalar to SIMD is only done for that nr of scalar instructions that is dividable by the width of the SIMD architecture

  9. Liquid SIMD • Migrating code from one SIMD architecture now is easy: SIMD instructions for arch X To SIMD For arch X scalar instructions for any arch a.k.a. Vitualized SIMD coded SIMD instructions for arch N Liquid SIMD (to scalar) To SIMD For arch Y To SIMD For arch Z compiletime runtime

  10. Dynamic translation • The paper describes a Hw design for dynamic translation Simple combinational logic to detect and translate (reverse of SIMD to scalar)

  11. Is life really that simple? • No: Converting from SIMD to scalar becomes more difficult if multiple vector elements are used to compute one result • Example: determining a minimum • Translation back to SIMD is impossible in current implementation min min min min

  12. SIMD to scalar • It gets worse if vector elements are reordered • Example: butterfly operation • Resource usage increases! • Translation back to SIMD is not possible. a b c d a b c d d c b a

  13. Drawbacks • Code is less efficient then optimized (by hand) for a specific SIMD architecture • Because of the translations, registers are utilized more then before • Some instructions cannot be supported • Code cache needs to be bigger because both scalar and SIMD generated code needs to be stored (in current implementation) • Only works for a predefined maximum vector width

  14. Strong points • + (fairly) SIMD architecture independent • + less code rewriting • + new hardware = speedup without extra effort • + code even runs when no SIMD accellerator is used. • + approach is 'general'. It recognizes 'structure' of the code by means of a set of rules. (i.s.o. using conversions for specific instructions)

  15. Weak points • Extra overhead (although small) • Efficiency of SIMDization depends on quality of set of translation rules • Not a perfect exploitation of SIMD (In real demanding apps, we have the feeling that the processor capabilities are not fully utilized because: • Some instructions cannot be translated • Future instructions are not utilized • In our opinion: • primarily solves second issue • it's a first small step … but there's a long way to go!

More Related