20 likes | 189 Views
Performance Comparison between OpenCL and OpenMP on Multi-core CPUs. Is there a performance difference?. Vectorize your code !. YES, it is. But be careful with data-dependent branches!. Figure out your memory access patterns !. Before. Use zero-copies !. After.
E N D
Performance Comparison between OpenCL and OpenMP on Multi-core CPUs Is there a performance difference?
Vectorize your code ! YES, it is ... But be careful with data-dependent branches! Figure out your memory access patterns ! Before Use zero-copies ! After Check if fine(-grain) works fine ! Optimize the math ! Before After Before Row-major is CPU friendly, column-major is NOT. Transposing OR undoing the transposition works! After Using fast-math compiler option works! Make sure accuracy is still OK! Replacing D2H and H2D with zero-copy works! Before After Fine grain parallelism doesn't work for all apps ... ... but increasing workload granularity might fix it! Jie Shen j.shen@tudelft.nl ... but it DOESN'T have to be ...