Models and Terminology

Models and Terminology

Degree of Parallelism • number of computations that can be executed in parallel.

Degree of Parallelism • Suppose an application supports a degree of parallelism A, • If the representation (language) of the algorithm allows a degree L of parallelism, • the compiler produces object code with a degree of parallelism C , and the architecture supports a degree of parallelism H , • then for the most efficient processing, what conditions must exist with respect to the degree of parallelism between each of these factors?

Degree of Parallelism • The for the most efficient processing: • which implies that an efficient development of parallel applications must consider all: algorithms, programming languages, compiler, operating system, and hardware structures.

Application characteristics • application is partitioned into a set of tasks that can be executed in parallel; evaluation must consider the following characteristics: • granularity, • degree of parallelism, • level of parallelism, and • data dependencies

Granularity • granularity is determined in function of the execution time R of the task and the communication time C with other tasks. • If R >> C the task granularity is large, i.e, coarse-grained (the least communication overhead). • If C >> R communication overhead dominates and the task is fine-grained. A compromise is a medium-grained task.

Level of Parallelism • The level of parallelism defines granularity: • task level, • An application has multiple tasks that can be executed in parallel • procedure level, • Task consists of procedures that can be executed in parallel • instruction level, • Instructions in a task can execute in parallel • operation level, could be considered same as instruction level and (maybe system code) • microcode level. (chip design?)

Level of Parallelism

Data dependencies are determined by the precedence constraints between tasks

Processing paradigms • applications can be modeled as: serial, serial-parallel-serial w/o dependencies, and serial-parallel-serial with dependencies.

Processing paradigms Degree of parallelism is 1.

Processing paradigms Models the master-slave condition.

Processing paradigms Possible blocking conditions on some processors.

Treleavan and Myers taxonomy • Milutinovic combined the taxonomies of Treleaven and Myers based on what drives the computational flow of the architecture. • Control-flow • Data-flow • Demand-driven

1. Control-driven (control-flow) architecture models • In control-flow the flow of computation is determined by the instruction sequence and flow of data as instructions execute. • This sequential flow of execution is controlled by programmers. • Control-flow architectures; • RISC architectures • CISC architectures • HLL architectures

Control-flow • What motivated the design of RISC processors?

CISC-RISC • The complexity of the CISC control unit left very little chip space for anything else. • By reducing the complexity of the instruction set more real estate become available for other processor functions. • Such as pipelines and larger register files.

CISC-RISC • Historically the x86 instruction set has had too much inertia for RISC processors to catch on in the mass market. • Today the characteristics that distinguish an RISC from a CISC are getting blurred.

HLL Architecture • Direct execution of high-level language by processor • One language per processor. • SYMBOL machine developed in 1971 one example. • Embedded JVM?

2. Data-driven (data-flow) architecture models • instruction execution is determined by data availability instead of a program counter. Instructions should be ready to execute as soon as operands are available. • Computational results (data tokens) are passed directly between instructions. Data tokens, once consumed by an executing instruction, they are not reusable by other instructions. A token consists of data and the address (tag) of a destination node.

data-flow • The token is compared against those in a matching store. If matched, the token is extracted and the instruction is issued for its execution. A data-dependency (data flow) graph is used to represent the program.

3. Reduction architecture (demand driven) models • instructions execute only when results are required as operands for another instruction already enabled for execution.

Consider the expression; • a data-driven mechanism follows a bottom up approach while the demand-driven mechanism uses a top-down approach

Flynn’s Taxonomy • Is based on the degree of parallelism exhibited by architecture in its data and control flow mechanisms.

Flynn’s Taxonomy

SISD (Single-Instruction [stream] Single Data [stream]) • Serial machine (typical Von-Newman design). Instructions are executed sequentially. • Execution stages may be overlapped (pipelined). A SISD machine may have more than one functional unit. All functional units are supervised by one control unit.

SISD

SIMD (Single-Instruction [stream] Multiple Data [stream]) • Multiple processing units are supervised by the same control unit. The control unit broadcasts the same instruction to all processors (PE’s) which operate on different data. • All PE’s share same memory but memory may be subdivided in different modules. ･ • SIMD systems with n processors can provide an n-fold speedup as long as a high degree of parallelism is supported at the instruction level.

SIMD

MISD (Multiple-Instruction [stream] Single Data [stream]) • This architecture does not have any literal architectural implementation

MIMD (Multiple-Instruction [stream] Multiple Data [stream]) • The Multiple Instruction Multiple Data architecture is the area in which the vast majority of recent developments in Parallel Architectures have taken place. • In this architecture each processor has it's own set of instructions which are executed under the control of the Control Unit associated within that processor. • Furthermore, each processor will often have an amount of local memory upon which the instructions will primarily operate. • The execution of this architecture cannot therefore be synchronous without the provision of explicit interprocessor synchronisation mechanisms.

MIMD

MIMD - shared memory • In shared-memory systems, n processors and p memory modules interchange information through an interconnection network; any pair of processors can communicate via shared locations. Ideally, p ≥ n and the interconnection network should allow p simultaneous access to keep all processors busy.

MIMD - shared memory • Shared-memory models: • The uniform-memory-access (UMA) model, • The nonuniform- memory-access (NUMA) model, and • The cache-only architecture (COMA) model.

UMA Model • The UMA model: physical memory is uniformly shared by all the processors; and, all processors have equal access time to all modules. Each processor however, may have its own private cache. Systems with a high degree of sharing are referred to as tightly coupled.

symmetric multiprocessor system • When all processor have equal access to all resources the system is referred to as a symmetric multiprocessor system. • i.e., all processors are equally capable of running executive programs (OS kernel) and I/O service routines.

SMP

asymmetric system • An asymmetric system features only one or a subset of processors with executive capabilities; the remaining processor are referred to as attached processors

SMP vs aSMP • Symmetric multiprocessor systems have identical processors and functions. (By contrast, an asymmetric multiprocessor system allocates resources to a specific processor even if that CPU is overloaded and others are relatively free.) The clear advantage is the balancing of the processing load across all resources.

NUMA Model • in this model access time depends on the location of memory items. Shared memory modules are physically distributed to all processors as local memory..

NUMA Model

COMA Model • this model assumes cache-only memory. It is a special case of the NUMA model in which the distributed memory are converted to local cache memory modules

COMA Model

cc-NUMA • combines distributed shared memory and cache directories

DMM • Distributed memory multi-computers • Clusters • These systems consist of multiple independent computers nodes interconnected via a message-passing network that provides point-to-point interconnection between nodes

Cluster

DMM • Advantages: • high throughput, • fault-tolerance, • dynamic reconfiguration in response to processing loads.

Models and Terminology

Models and Terminology

Presentation Transcript

terminology

Terminology

Terminology

Terminology

Terminology

Terminology

Terminology

Terminology

Terminology

Terminology

Concepts and Terminology

Terminology and Abbreviations

Terminology

Terminology and Abbreviations

Issues and Terminology

Terminology and documentation*

Which came first – terminology or models?

Terminology and Definitions

Collocations and Terminology

Terminology

Terminology

Terminology and Relationship