100 likes | 333 Views
Symmetric and CC-NUMA. Scope. Design experiences of SMPs and Coherent Cache Nonuniform Memory Access (CC-NUMA) NUMA Natural extension of SMP systems. Processor & Cache. Processor & Cache. Processor & Cache. Processor & Cache. . . . . . . Interconnect. Bus/Crossbar. Memory. Memory.
E N D
Scope • Design experiences of SMPs and Coherent Cache Nonuniform Memory Access (CC-NUMA) • NUMA • Natural extension of SMP systems
Processor & Cache Processor & Cache Processor & Cache Processor & Cache . . . . . . Interconnect Bus/Crossbar Memory Memory I/O I/O Processor & Cache Processor & Cache Processor & Cache Processor & Cache . . . . . . Bus/Crossbar Bus/Crossbar Memory Memory Remote Cache Remote Cache I/O I/O Architectures Shared Memory logic structure SMP architecture . . . Node 1 Node N
Advantages of shard memory systems (SMP or CC-NUMA) • Symmetry • Any processor can access any memory location and I/O device • Single address space • Single system image • One copy of OS, database app, etc • Reside in the shared memory • User no control over data distribution, redistribution • Single OS schedules processes • Easy workload management, dynamic load balancing
Advantages of shard memory systems (SMP or CC-NUMA) • Caching • Data locality supported in the hierarchy • Coherency • Enforced by the hardware? • MESI-like snoopy protocol • Memory Communication • Low latency • Simple load/store instructions • Hardware generates coherency information
Basic Issues that SMPs must address • Availability • Biggest problem • Failure of the bus, memory, OS !! • Bottleneck • Compete for the memory bus and shard memory • Packet switched-bus (split transactions) • Latency • Low latency but still large compared to CPU • Memory bandwidth vs. Processor speed vs. Memory capacity • Scalability • A bus is not scalable
CC-NUMA • Extends SMPs by connecting several SMP nodes into a larger system • Employ directory based cache coherent protocol • While maintaining the advantages, attacks the scalability problem
Distributed shared memory enhances: • Scalability • Memory capacity, I/O capabilities increase by adding more nodes • Bandwidth • An app can access multiple local memories concurrently • Availability • Multiple copies of a portion of OS can run on multiple nodes • Failure of one will not disrupt the entire system
Programming • We said that • “data structures get distributed” • “Cache coherency then tracks the changes” • Any issues? (remote cache vs local memory) • P, Q: processes • A, B: arrays P: Q: Phase 1: use(A) use(B) Phase 2: use(B) use(A)