Parallel and Distributed Systems

Parallel and Distributed Systems Peter Brezany Institute for Software Science University of Vienna

Typical One-Processor Architecture (SISD Architecture) SISD : Single Instruction stream Single Data stream

Array Processor (SIMD Architecture) SIMD: Single Instruction stream Multiple Data streams

A typical scientific program spends approx. 90% of its execution time in loops. Example in Java: float A[1000], B[1000]; for (int i = 1; i < 1000; i++) { A[i-1] = B[i]; } The above loop can be expressed in Fortran 95 in the following way: A(0:998) = B(1:999) This statement can be directly mapped onto a SIMD processor. There is an initiative to extend Java by similar constructs. Loop Parallelizing for SIMDSs

Distributed-memory machines (DM Multiprocessors, DM MIMDS) Each processor has local memory and disk Communication via message-passing Hard to program: explicit data distribution Goal: minimize communication Shared-memory machines (SM Multiprocessors, SM MIMDs, SMPs) Shared global address space and disk Communication via shared memory variables Ease of programming Goal: maximize locality, minimize false sharing Current trend: Cluster of SMPs Parallel Multi-Processor-Hardware(MIMD Architectures)

Distributed Memory Architecture(Shared Nothing) Interconnection Network CPU CPU CPU CPU Local Memory Local Memory Local Memory Local Memory

DMM: Shared Disk Architecture Interconnection Network CPU CPU CPU CPU Local Memory Local Memory Local Memory Local Memory Global Shared Disk Subsystem

Example in Java: float A[10], B[10]; for (int i = 1; i < 10; i++) { A[i] = B[i-1]; } For two processors, P1 and P2, a straightforward solution would be: Loop Parallelizing for DM MIMDs

Loop Parallelizing for DM MIMDs (2)

Code on P1: float A[5], B[5]; float temp; for (int i = 1; i < 6; i++) { if ( i == 5 ) { receive message from P2 into temp;A[i-1] = temp; { else A[i-1] = B[i]; } Code on P2: float A[5], B[5]; float temp; for (int i = 0; i < 5; i++) { if ( i == 0 ) { temp = B[0]; send temp to P1; { else A[i-1] = B[i]; } Loop Parallelizing for DM MIMDs (3)

Shared Memory Architecture(Shared Everything, SMP) Interconnection Network CPU CPU CPU CPU Global Shared Memory

Example in Java: float A[1000], B[1000]; for (int i = 1; i < 1000; i++) { A[i-1] = B[i]; } If we have, e.g. two processors, P1 and P2, a straightforward (non-optimal) solution would be: Code on P1: for (int i = 1; i < 500; i++) { A[i-1] = B[i]; } Code on P2: for (int i = 500; i < 1000; i++) { A[i-1] = B[i]; } Data elements of A and B are stored in the shared memory. Loop Parallelizing for SMPs

Cluster of SMPs Interconnection Network CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU 4-CPU SMP 4-CPU SMP 4-CPU SMP 4-CPU SMP

Abstract Maschine Model

Cluster von PCs

No Pipeline Pipeline

Towards Parallel Databases

Relational Data Model Example Cities Name Population Land Munich 1211617 Bayern Bremen 535058 Bremen . . . . . . . . . Schema: (Cities: STRING, Population: INTEGER, Land: STRING Relation (represented by a table): {(Munich, 1.211.617, Bayern), (Bremen, 535.058, Bremen), ...} Key : {Name}

Relational Data Model - Queries SELECT <Attribute List> FROM <Relation Name> [WHERE <Condition>] – option .......... other options Example: This is equivalent to: SELECT * SELECT Name, Population, Land FROM Cities FROM Cities SELECT Name, Land FROM Cities WHERE Population > 600000

Grid Idea

Parallel and Distributed Systems