220 likes | 378 Views
Operating System Issues in Multi-Processor Systems. John Sung Hardware Engineer Compaq Computer Corporation. Outline. Multi-Processor Hardware Issues Snoopy Bus System Architecture AMD Athlon’s Snoopy Protocol ccNUMA System Architecture AMD Athlon’s LDT System Bus
E N D
Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation
Outline • Multi-Processor Hardware Issues • Snoopy Bus System Architecture • AMD Athlon’s Snoopy Protocol • ccNUMA System Architecture • AMD Athlon’s LDT System Bus • SGI Origion’s ccNUMA System Architecture • Alpha 21364 System Architecture • ccNUMA and CPU Scheduling • Conclusion
Multi-Processor Hardware Issues • Bandwidth/Latency • Processor to Processor • Processor to Memory • Processor to I/O • Scalability • Increase performance as you increase CPU/Memory • Coherency/Synchronization • Give software coherent view of memory • Provide synchronization primitives
Snoopy Bus System Architecture • A bus Connects Processors,Memory,and I/O • Scales upto ~16 processors • Limited by bus bandwidth • Cache Coherency Protocol • Snoops the bus for memory traffic • Each set has to “listen” for addresses in it’s cache • Does the “right thing” to give software coherent view of memory
Snoopy Bus System Architecture CPU Core CPU Core CPU Core Cache Cache Cache Bus Memory I/O Memory I/O Memory I/O
ccNUMA System Architecture • Cache-Coherent Non-Uniform Memory Access • Memory is distributed and attached to processors • Some network connects each processor/memory sets • Each processor owns part of the memory space • Cache coherency protocol • Gives software coherent view of memory • Protocol primitives for synchronization • Directory to keep track of who has a copy of memory
ccNUMA System Architecture CPU Core CPU Core Cache Cache Memory Directory I/O Network Router Memory Directory I/O Network Router Network Fabric
SGI CrayLinkTM • Node = 2 CPU and their cache • Module = Memory + Directory + HUB • 2 Modules per Router • System = Modules + Routers + CrayLinkTM Network
OS’s Questions • Single CPU System • What to schedule next? • ccNUMA System • What to schedule next? • Which cpu to schedule it to? • Where should the process information be located at? • 1 or many instances of OS?
OS’s Choices for a Process • Single CPU System • Process has1 choice • Process information has 1 choice • ccNUMA System with N CPU’s and M Memory • Process has N choices • Process information M choices per virtual page • “Distance” between process and it’s information
Context Switch Penalty • Single CPU System • Saving/Restoring process state (PCB) • Scheduling routine • ccNUMA System • Saving/Restoring process state (PCB) • Scheduling routine • Moving process’s information
Some Common Sense • Replicate parts of the OS across processors • System calls will happen often • Minimize process movement • Cost of moving a process to another CPU is high • Less than swaping to disk, most of the time • Higher than simple context switching • But if you have to move a process • Minimize the amount of information to move • Opportunity for a cache????
Conclusion • Hardware • Bandwidth and Latency for performance • Cache Coherency for correctness • Operating System • ccNUMA adds complexity in CPU scheduling • HW performance = Lower Context Switch Penalty => flexibility in scheduling choices for a process
References • Alpha • http://www.digital.com/alphaoem/present/ev7forum98.ppt • http://www.compaq.com/InnovateForum99/presentation/session31/ • http://www.digital.com/alphaoem/ • AMD • http://www.amd.com/products/cpg/mpf/speech/slides99.ppt • SGI • http://www-europe.sgi.com/origin/numa_tech.html • BenchMarks • http://www.spec.org/ • http://www.tpc.org/
Abbreviation Index • AMD - Advanced Micro Devices • SGI - Silicon Graphics Inc. • ECC - Error Correction Code • SECDED - Single Error Correct Double Error Detect • API - Alpha Processor Inc • AGP - Accelerated Graphics Port • DDR DRAM - Double Data Rate Dynamic RAM • LTD - Lightning Data Transport • PCI - Peripheral Component Interconnect • CMOS - Complementary Metal Oxide Semiconductor • CAS - Column Address Strobe • TPC-C -Transaction Processing Performance Council Benchmark • ccNUMA - Cache-Coherent Non-Uniform Memory Access • SMP - Symmetric Multi-Processing