Distributed Programming CA107 Topics in Computing Series

Distributed ProgrammingCA107 Topics in Computing Series Martin Crane Karl Podesta

The Basics….. • What is a Distributed System (DS)? • How does it differ from a Parallel Computer (MPP)? • differences become fuzzy…now called Supercomputers or High Performance Computers (HPC) • Supercomputers and Supermodels: • both expensive • both hard to deal with/prone to tantrums • both look glamorous but... • Both spend lots of time doing tedious tasks for others: • mostly matrix-vector products for Supercomputers • being live mannequins for Supermodels

Why High Performance Computing? • Solve larger and larger scientific problems • advanced product design • economic analysis • weather prediction/ climate modelling • Store and process huge amount of data • data mining and knowledge discovery • image processing, multi-media information • internet information storage and search (eg GOOGLE)

Different Supercomputers (MPPs) in Your Neighbourhood • Single Instruction, Multiple Data (SIMD) • as seen on PlayStation 2 • very useful for processing large arrays eg a(i) = b(i) + c(i)*d(i){as are found in games} • Multiple Instruction, Multiple Data (MIMD) • as seen in Deep Blue • But these are dinosaurs - we want something more flexible

Problems with Traditional Supercomputer (ie MPP) • Expensive • Very high starting cost ($10,000s per node) • Expensive software • High maintenance cost • Costly to upgrade • Vendor dependent • lots of companies have come and gone (datacube, Connection Machines etc.) So, real/poor people cannot do HPC!

PC Cluster: a poor-man’s supercomputer! • built from high-end PCs and high-speed comms network • supports standard parallel programming based on message-passing model (MPI language) • cheap (16 node cluster can cost less than $10k)

Cluster Diagram Here

DCU CA Cluster Resources • “John the Baptist” Cluster • built by Redbrick using old CA machines • 24 individual 450MHz machines • connected by a fast ethernet switch • harbinger of better things…. • “The one that is to come”…… • 24 SMP machines • each with 2 GHz • plus loadsa memory! • arrives about Xmas time, appropriately enough

What are the issues in HPC? • Communication Vs Computation • size/ nature of problem • interconnect speed/ processor speed • Fault tolerance • quality of hardware • nature of problem • Load balancing • nature of problem/ quality of programmer • even an easy problem can be made difficult & slow by a bad implementation

Influence of Nature of Problem on Speed • What is speed? • speed up is better: Time on 1 node/ Time on n nodes • Speed-up and Problems • very good: embarrassingly parallel problems • fair to middling: regular and synchronous problems • a bit of cross-talk between nodes • bad: irregular/ asynchronous problems • lots of cross-talk between nodes

Distributed Programming CA107 Topics in Computing Series