1 / 12

“What and how can the Open64 community collaborate more closely?” - our experiences and ideas

Embedded Software Consortium. “What and how can the Open64 community collaborate more closely?” - our experiences and ideas. Jenq Kuen Lee Chairman, MOE Embedded Software Consortium, Taiwan Professor, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan

grover
Download Presentation

“What and how can the Open64 community collaborate more closely?” - our experiences and ideas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Open64 Developers Forum 2010 Embedded SoftwareConsortium “What and how can the Open64 community collaborate more closely?”- our experiences and ideas Jenq Kuen Lee Chairman, MOE Embedded Software Consortium, Taiwan Professor, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan jklee@cs.nthu.edu.tw

  2. Workshop on Embedded Systems Education, 2009 Outline • Our experience with Open64 for compiler research • Programming language and compiler research lab. Tsing-Hua Univ., Taiwan • Major research funding from Taiwan MOEA • What and how can the Open64 community collaborate more closely ?

  3. Compiler for VLIW DSP processors with distributed register files Local register allocation Register bank assignment and cluster assignment for distributed register architectures “PALF: Compiler Supports for Irregular Register Files in Clustered VLIW DSP Processors” [Lin, CCPE’07] Global register allocation Global decisions on register bank assignment of multiple basic blocks “LC-GRFA: Global Register File Assignment with Local Consciousness for VLIW DSP Processors with Non-uniform Register Files” [Lu, CCPE’09] Improved register spilling Spilling data to unoccupied register banks rather than to memory “Expression Rematerialization for VLIW DSP Processors with Distributed Register Files” [Wu, CPC’09] SIMD intrinsics Means and essential optimizations for users to write high-performance code for VLIW DSP “SIMD Intrinsic Supports for VLIW DSP Processors with Distributed Register Files” [Kuan, CPC’10] Compiler for VLIW DSP processors with distributed register files Embedded SoftwareConsortium

  4. Probabilistic pointer analysis (PPA) Quantitative Aggressive optimizations can be applied Fast With the aid of SSA form, explicit def-side can be found in linear χ in SSA form helps find potential def-side that can not be known by symbolic checking Implemented in Opt_ssa.cxx in WOPT phase Incorporate with edge profiling , acquire more accurate point-to information Optimizations Point-to information can be used to guide memory locality optimization in the presence of pointers. Speculative execution, Transactional memory, code specialization, data layout assignments PPA in SSA Form of Open64 int *p, *q, v, u; p=&v; q=&u; while ( … ) // condition 1 if ( … ) // condition 2 p=q; else q=p; *p =… // where does p points-to ? multi-level memory systems:* internal memory (small & fast)* external memory (large & slow) Internal Memory DSP DSP Software Cache DSP External Memory Software Cache API Interprocedural Probabilistic Pointer Analysis, Peng-Sheng Chen, Yuan-Shin Hwang, Dz-Ching Ju, Jenq Kuen Lee, IEEE Transactions on Parallel and Distributed Systems, Volume 15, Issue 10, pp. 893-907, Oct. 2004.

  5. OpenCL Compiler Support Based on Open64 for MPUs+GPUs OpenCL is an emerging standard for heterogeneous multicore programming. We’ve incorporated Open64 compiler in ATI SDK Syntax supports Qualifiers Vector data types Built-in functions Future directions WHIRL/CGIR optimizations Data locality and SIMD optimizations Our Ongoing Work with OpenCL Embedded SoftwareConsortium ATI SDK → LLVM approach → libatiocl.so builtin-x86.bc Internal optimizer and linker clc prelink.bc opt.s as kernel.cl Reuse stub code and metadata ld stub/metadata opencc OpenCL_kernel.s lib.c llvm-extract/llc builtin-x86.bc → Open64 approach → clc: OpenCL-LLVM front-end .bc: LLVM IR files

  6. MOE ESW Consortium, Taiwan MOE Advisory Office SoC Consortium Advisory Committee ESW consortium Other consortiums ES Design contest Advisory Committee Partner Universities Collaboration With TEIA Collaboration with NSC OpenSource/Embedded Program Partneruniversities

  7. Develop 25courses and lab modules on Embedded System Software

  8. Embedded course development flow of the ESW

  9. Promote Open64 via CollaborationCurriculums Open64 courses and textbooks Hand-on labs Make it easy to break engineering challenges with Open64, and have students to focus on scientific innovations. Encyclopedia Compilers Open-64 Lectures Notes Hand-on Labs How to devise Compilerto deliver optimal performanceon Open-64

  10. Possible Collaboration on Joint Research Projects Potential collaboration items with OpenCL on Open-64 OpenCL and CUDA Front-end Update-to-dateC/C++ Front-end Optimizations for Embedded or Green OpenCL Optimizationsat WHIRL & CGIR code size low-power Code Generation for New Targets IBM & UIUCBlue Waters EmbeddedMulticore Google Android Nvidia GPU ATI GPU

  11. Wish List: Serializable CGIR As a research compiler • to save/restore current states is really important • a valuable observation may be disappeared after other team members changed prior phase’s implementation, and then we have to find this case in other benchmarks/applications again and again • to provide an interface in the entry point of CG phase is also important • sometimes we want to use different compilers’ optimizations just before CG phase • in order to compare optimization capabilities of different compiler’s front-end & middle-end • in order to take advantages of other implementations • for example, to use LLVM for optimizing the code, and directly output to CG phase for performing existed optimizations & generating codes

  12. Wish List: Replace Build System by CMake • More powerful analysis for dependencies • it enables parallel make easily • on an Intel Core 2 Extreme QX9650 3.4GHz (O.C.) machine, to build a full PACC32 compiler (based on Open64 4.0) with gmake -j5 just needs no more than 5 minutes • it’s convenient to release product in binary form • rpath can be easily setup by simple CMake commands • any required runtime libraries can be included to binary packages automatically • the speed of system inspection & build process is faster than autotools (autoconf/automake/libtool), which is also not used in Open64 project so far

More Related