100 likes | 389 Views
PARSEC FACESIM. study and parallelization Dmitri Makarov, Dmitri Shtilman. facesim facts. iterative numerical methods: data parallel (floating point computation). size of the problem: ~370K tetrahedrons. included in PARSEC. C++ application.
E N D
PARSEC FACESIM study and parallelization Dmitri Makarov, Dmitri Shtilman
facesim facts • iterative numerical methods: data parallel (floating point computation). • size of the problem: ~370K tetrahedrons. • included in PARSEC. C++ application. • parallelized: taskQ, thread pool with custom barrier implementation. • not scalable beyond 16 threads (for 128 threads speedup only 16x on T2+)
facesim issues • in-house barrier: • pthread_cond_wait() • pthread_cond_signal() • not parallelized stages of the simulation • overhead of tasks, extra computations • resource contention: 128 threads • 32 FPUs • 32 LSs
improvements • reworked the thread pool implementation: • N-1 threads created before the simulation starts • each thread works on its partition of data • master sets a flag when work is ready • all wait at barrier for everyone else to finish task • no need to add tasks to queue (same entry point) • use spin barrier instead of pthread barrier • parallelized sequential stages
observations • generic API may be portable but inefficient lib implementation could kill performance. • C++ can introduce a lot of redundancy if not used carefully: • new T[N] calls T() sequentially N times. • library should implement default constructors as efficiently as possible. • flexibility has a cost: • load balancing overhead can outweigh its benefits
future work • apply our observations to other PhysBAM-based applications • study and optimize cloth simulation • port facesim to CUDA, OpenCL (real-time rendering) • implement important parts of PhysBAM in Scala