1 / 40

Large Computer Systems

Large Computer Systems. CE 140 A1/A2 27 August 2003. Rationale. Although computers are getting faster, the demands are also increasing at least as fast High-performance applications: simulations and modeling

Download Presentation

Large Computer Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large Computer Systems CE 140 A1/A2 27 August 2003

  2. Rationale • Although computers are getting faster, the demands are also increasing at least as fast • High-performance applications: simulations and modeling • Circuit speed cannot be increased indefinitely  eventually, physical limits will be reached, and quantum mechanical effects will be a problem

  3. Rationale • To handle larger problems, parallel computers are used • Machine level parallelism • Replicates entire CPUs or portions of them

  4. Design Issues • What are the nature, size, and number of the processing elements? • What are the nature, size, and number of the memory modules? • How are the processing and memory elements interconnected? • What applications are to be run in parallel?

  5. Grain Size • Coarse-grained parallelism • Unit of parallelism is larger • Running large pieces of software in parallel with little or no communication between the pieces • Example: large time-sharing systems • Fine-grained parallelism • Parallel programs with high degree of communication with each other

  6. Tightly Coupled versus Loosely Coupled • Loosely coupled • Small number of large, independent CPUs that have relatively low-speed connections to each other • Tightly coupled • Smaller processing units that work closely together over high-bandwidth connections

  7. Design Issues • In most cases • Coarse-grained is well suited for loosely coupled • Fine-grained is well suited for tightly coupled

  8. Communication Models • In a parallel computer system, CPUs communicate with each other to exchange information • Two general types • Multiprocessors • Multicomputers

  9. Multiprocessors • Shared Memory System • All processors may share a single virtual address space • Easy model for programmers • Global memory • any processor can access any memory module without intervention by another processor

  10. Uniform Memory Access (UMA) Multiprocessor P1 P2 Pn INTERCONNECTION NETWORK M1 M2 Mk

  11. Non-Uniform Memory Access (NUMA) Multiprocessor P1 M1 P2 M2 Pn Mn INTERCONNECTION NETWORK

  12. Multiprocessor

  13. Multicomputers • Distributed Memory System • Each CPU has its own private memory • Local/private memory – a processor cannot access a remote memory without the cooperation of the remote processor • Cooperation takes place in the form of a message passing protocol • Programming for a multicomputer is much more difficult than programming a multiprocessor

  14. Distributed Memory System M1 M2 Mn P1 P2 Pn INTERCONNECTION NETWORK

  15. Distributed Memory System

  16. Multiprocessors versus Multicomputers • Easier to program for multiprocessors • But multicomputers are much simpler and cheaper to build • Goal: large computer systems that combines the best of both worlds

  17. Taxonomy of Large Computer Systems

  18. Taxonomy of Large Computer Systems

  19. Symmetric MultiProcessors (SMP) • Multiprocessor architecture where all processor can access all memory locations uniformly • Processors also share I/O • SMP classified as an UMA • SMP is simplest multiprocessor system • Any processor can execute either the OS kernel or user programs

  20. SMP • Performance improves if programs can be run in parallel • Increased availability: if one processor breaks down, system does not stop running • Performance is also improved incrementally by adding processors • Does not scale well beyond 16 processors

  21. SMP

  22. SMP

  23. Clusters • A group of whole computers connected together to function as a parallel computer • Popular implementation: Linux computers using Beowulf clustering software

  24. Clusters • High availability – redundant resources • Scalability • Affordable – off-the-shelf parts

  25. Clusters Cyborg Cluster Drexel University 32 nodes Dual P3 per node

  26. Clusters

  27. Memory Organization • Shared Memory System (Multiprocessors) • each processor may also have a cache • convenient to have a global address space • For NUMA, accesses to the global address space may be slower than access to remote address space • Distributed Memory System (Multicomputers) • Private address space for each processor • Easiest way to connect computers into a large system • Data sharing is implemented through message passing

  28. Issues • When processors share data, different processors must access the same value for a given data item • When a processor updates its cache, it must also update the caches of other processors, or invalidate other processors’ copies •  shared data must be coherent

  29. Cache Coherence • All cached copies of shared data must have the same value at all times

  30. Snooping Caches • So-called because individual caches “snoop” on the bus

  31. Write-Through Protocol • Write-Through with Update (Write Update) • Update cache and memory, update the cache of the rest of the processors • Write-Through without Update (Write Invalidate) • Update cache and memory, invalidate the cache of the rest of the processors

  32. Write-Back Protocol • When a processor wants to write to a block, it must acquire exclusive control/ownership of the block • All other copies are invalidated • Block’s contents may be changed at any time • When another processor requests to read the block, owner processor sends block to requesting processor, and returns control of block to the memory module which updates block to contain the latest value

  33. MESI Protocol • Popular write-back cache coherence protocol named after the initials of the four possible states of each cache line • Modified – entry is valid; memory is invalid; no copies exist • Exclusive – no other cache holds the line; memory is up to date • Shared – multiple caches may hold the line; memory is up to date • Invalid – cache entry does not contain valid data

  34. Snoopy Cache Issues • Snoopy caches require broadcasting information over the bus leading to increased bus traffic if the system grows in size

  35. Directory Protocols • Uses a directory that keeps tracks of locations where multiple copies of a given data item is present • Eliminates need for broadcast • If directory is centralized, the directory will be a bottleneck

  36. Performance • According to Amdahl’s law, introducing machine parallelism will not have a significant effect on performance if the program cannot take advantage of the parallel architecture • Not all programs parallelize well

  37. Performance

  38. Scalability Issues

  39. Scalability Issues • Bandwidth • Latency • Depends on topology

More Related