Large Computer Systems

Large Computer Systems CE 140 A1/A2 27 August 2003

Rationale • Although computers are getting faster, the demands are also increasing at least as fast • High-performance applications: simulations and modeling • Circuit speed cannot be increased indefinitely  eventually, physical limits will be reached, and quantum mechanical effects will be a problem

Rationale • To handle larger problems, parallel computers are used • Machine level parallelism • Replicates entire CPUs or portions of them

Design Issues • What are the nature, size, and number of the processing elements? • What are the nature, size, and number of the memory modules? • How are the processing and memory elements interconnected? • What applications are to be run in parallel?

Grain Size • Coarse-grained parallelism • Unit of parallelism is larger • Running large pieces of software in parallel with little or no communication between the pieces • Example: large time-sharing systems • Fine-grained parallelism • Parallel programs with high degree of communication with each other

Tightly Coupled versus Loosely Coupled • Loosely coupled • Small number of large, independent CPUs that have relatively low-speed connections to each other • Tightly coupled • Smaller processing units that work closely together over high-bandwidth connections

Design Issues • In most cases • Coarse-grained is well suited for loosely coupled • Fine-grained is well suited for tightly coupled

Communication Models • In a parallel computer system, CPUs communicate with each other to exchange information • Two general types • Multiprocessors • Multicomputers

Multiprocessors • Shared Memory System • All processors may share a single virtual address space • Easy model for programmers • Global memory • any processor can access any memory module without intervention by another processor

Uniform Memory Access (UMA) Multiprocessor P1 P2 Pn INTERCONNECTION NETWORK M1 M2 Mk

Non-Uniform Memory Access (NUMA) Multiprocessor P1 M1 P2 M2 Pn Mn INTERCONNECTION NETWORK

Multiprocessor

Multicomputers • Distributed Memory System • Each CPU has its own private memory • Local/private memory – a processor cannot access a remote memory without the cooperation of the remote processor • Cooperation takes place in the form of a message passing protocol • Programming for a multicomputer is much more difficult than programming a multiprocessor

Distributed Memory System M1 M2 Mn P1 P2 Pn INTERCONNECTION NETWORK

Distributed Memory System

Multiprocessors versus Multicomputers • Easier to program for multiprocessors • But multicomputers are much simpler and cheaper to build • Goal: large computer systems that combines the best of both worlds

Taxonomy of Large Computer Systems

Symmetric MultiProcessors (SMP) • Multiprocessor architecture where all processor can access all memory locations uniformly • Processors also share I/O • SMP classified as an UMA • SMP is simplest multiprocessor system • Any processor can execute either the OS kernel or user programs

SMP • Performance improves if programs can be run in parallel • Increased availability: if one processor breaks down, system does not stop running • Performance is also improved incrementally by adding processors • Does not scale well beyond 16 processors

SMP

Clusters • A group of whole computers connected together to function as a parallel computer • Popular implementation: Linux computers using Beowulf clustering software

Clusters • High availability – redundant resources • Scalability • Affordable – off-the-shelf parts

Clusters Cyborg Cluster Drexel University 32 nodes Dual P3 per node

Clusters

Memory Organization • Shared Memory System (Multiprocessors) • each processor may also have a cache • convenient to have a global address space • For NUMA, accesses to the global address space may be slower than access to remote address space • Distributed Memory System (Multicomputers) • Private address space for each processor • Easiest way to connect computers into a large system • Data sharing is implemented through message passing

Issues • When processors share data, different processors must access the same value for a given data item • When a processor updates its cache, it must also update the caches of other processors, or invalidate other processors’ copies •  shared data must be coherent

Cache Coherence • All cached copies of shared data must have the same value at all times

Snooping Caches • So-called because individual caches “snoop” on the bus

Write-Through Protocol • Write-Through with Update (Write Update) • Update cache and memory, update the cache of the rest of the processors • Write-Through without Update (Write Invalidate) • Update cache and memory, invalidate the cache of the rest of the processors

Write-Back Protocol • When a processor wants to write to a block, it must acquire exclusive control/ownership of the block • All other copies are invalidated • Block’s contents may be changed at any time • When another processor requests to read the block, owner processor sends block to requesting processor, and returns control of block to the memory module which updates block to contain the latest value

MESI Protocol • Popular write-back cache coherence protocol named after the initials of the four possible states of each cache line • Modified – entry is valid; memory is invalid; no copies exist • Exclusive – no other cache holds the line; memory is up to date • Shared – multiple caches may hold the line; memory is up to date • Invalid – cache entry does not contain valid data

Snoopy Cache Issues • Snoopy caches require broadcasting information over the bus leading to increased bus traffic if the system grows in size

Directory Protocols • Uses a directory that keeps tracks of locations where multiple copies of a given data item is present • Eliminates need for broadcast • If directory is centralized, the directory will be a bottleneck

Performance • According to Amdahl’s law, introducing machine parallelism will not have a significant effect on performance if the program cannot take advantage of the parallel architecture • Not all programs parallelize well

Performance

Scalability Issues

Scalability Issues • Bandwidth • Latency • Depends on topology

Large Computer Systems

Large Computer Systems

Presentation Transcript

Computer Systems

Selecting Large Systems

Computer Systems

Computer Systems

Computer Systems

Computer Systems

Computer Systems

Computer Systems

Computer Systems

Computer Systems

Computer Systems

Security For Large computer systems Lecture 11

Computer Systems

Computer Systems

Large-Scale Systems

Implementing Large Systems

Large Digital Systems

Computer Systems

Computer Systems

Computer Systems

Computer Systems