230 likes | 280 Views
Parallel Image Processing. Programming and Architecture. IST PhD Lunch Seminar . Wouter Caarls. Quantitative Imaging Group. Why Parallel?. Processing time Smaller timesteps, more scales, faster response times Memory Larger images, more dimensions Energy consumption
E N D
Parallel Image Processing Programming and Architecture IST PhD Lunch Seminar Wouter Caarls Quantitative Imaging Group
Why Parallel? • Processing time • Smaller timesteps, more scales, faster response times • Memory • Larger images, more dimensions • Energy consumption • More applications, smaller devices
Data parallelism • Many image processing operations have locality of reference (segmentation, filtering, distance transforms, etc.) • Data parallelism
Task farm parallelism • An application consists of many different operations • Some of these operations are independent (scale spaces, parameter sweeps, noise realizations, etc.) • Task farm parallelism
Pipeline parallelism • An image processing algorithm consists of consecutive stages • If multiple objects are to be processed, they may be in different stages at the same time • Pipeline parallelism
Parallel hardware architecturesFine grained • Irregular • Superscalar (most modern microprocessors) • VLIW (DSPs) • Regular • Vector (supercomputers, MMX) • SIMD (graphics processors) • Custom • FPGA
Parallel hardware architecturesCoarse grained • Homogeneous • Multi-core, SMP • Cluster • Heterogeneous • Embedded systems • Grid
Obstacles • Programming • Synchronization, bookkeeping • Different systems, languages, optimization strategies • Choosing an architecture • Analyze program before it is written • Additional requirements or unexpected performance may require rewrite
Architecture-independent parallel programming • Data parallelism • Differentiate between synchronization pattern and computation • Library provides pattern, user provides computation • Task farm & pipeline parallelism • Operations do not work on images, but on streams • Sequences of operation calls do not imply an order, but a stream graph.
+ = + = Algorithmic Skeletons
Pixel Neighbourhood Recursive neighbourhood Stack Filter Associative reduction Example skeletons
Constructing stream graphs capture normalize • By program (dynamic) capture(orig); normalize(orig, norm); dx(orig, x_der, 1.0); dy(orig, y_der, 1.0); direction(x_der, y_der, dir); display(dir); • Visually (static) dx dy direction display
Processor 1 Processor 2 Mapping stream graphs to processors
Processor 1 Processor 2 1 1 3 3 2 2 1 1 2 1 1 2 4 5 5 6 Dealing with heterogeneous tasks
Processor 1 Interconnect Processor 2 1 1 3 3 2 2 1 4 1 2 4 2 1 4 1 3 5 8 4 7 5 Dealing with interconnect
Processor 1 Interconnect Processor 2 1 1 3 2 2 1 1 1 2 4 1 2 4 3 3+4 3 (3)+4 (3)+4 (3)+7 (3)+3 Dealing with dependencies
Choosing an architecture automatically • Architecture-independent program allows automatic analyis after it is written, but before an architecture is chosen • Based on certain constraints, architecture can be chosen automatically to optimize some cost function. • Tradeoff between cost, power and performance must be made by the designer
Design Space Exploration Archi- tecture Explore Program Analyze Metrics
performance minimum performance cost Search strategyConstrained single objective
performance cost Search strategyMultiobjective tradeoff iteration
performance cost Search strategyStrength Pareto
Conclusions Architecture-independent programming allows • Parallel programming without bookkeeping • Targeting heterogeneous systems • Choosing the most appropriate architecture automatically http://www.qi.tnw.tudelft.nl/~wcaarls/smartcam