270 likes | 409 Views
Tornado: Maximizing Locality and Concurrency in a SMMP OS. Contents. Types of Locality Locality: A closer look Requirements for locality Design Basics of Tornado Test Results Conclusion. Types of Locality*. Temporal locality
E N D
Contents • Types of Locality • Locality: A closer look • Requirements for locality • Design Basics of Tornado • Test Results • Conclusion
Types of Locality* • Temporal locality “The concept that a resource that is referenced at one point in time will be referenced again sometime in the near future.” • Spatial locality “The concept that the likelihood of referencing a resource is higher if a resource near it has been referenced.” • Sequential locality “The concept that memory is accessed sequentially.” *Source: Wikipedia
x x Locality: A closer look, Read only case bool x = true; while (x) { // Do some work // reading but not // writing x… } Processor # 1 Processor # 2 Cache Cache x Memory
Locality: A closer look, Read only case bool x = true; while (x) { // Do some work // reading but not // writing x… } Processor # 1 Processor # 2 x x Cache Cache x Memory
Locality: A closer look, Read only case bool x = true; while (x) { // Do some work // reading but not // writing x… } Processor # 1 Processor # 2 x x Cache Cache x Memory
Locality: A closer look, Read only case bool x = true; while (x) { // Do some work // reading but not // writing x… } • Notes: • No accesses on the bus • Because accesses are reads that are satisfied in local caches and no invalidations are sent Processor # 1 Processor # 2 x x Cache Cache x Memory
x x Locality: A closer look, Read/Write case bool x = true; while (x) { x = false; // Do other // work… } bool x = true; while (x) { x = false; // Do other // work… } Processor # 1 Processor # 2 Cache x Memory
Locality: A closer look, Read/Write case bool x = true; while (x) { x = false; // Do other // work… } bool x = true; while (x) { x = false; // Do other // work… } Processor # 1 Processor # 2 x x x Memory
Locality: A closer look, Read/Write case bool x = true; while (x) { x = false; // Do other // work… } bool x = true; while (x) { x = false; // Do other // work… } Processor # 1 Processor # 2 x x Invalidate block containing x x Memory
Locality: A closer look, Read/Write case bool x = true; while (x) { x = false; // Do other // work… } bool x = true; while (x) { x = false; // Do other // work… } Processor # 1 Processor # 2 1. Cache miss x x 2. Read request x Memory
Locality: A closer look, Read/Write case bool x = true; while (x) { x = false; // Do other // work… } bool x = true; while (x) { x = false; // Do other // work… } Processor # 1 Processor # 2 1. Cache miss x x 2. Read request 3. Data x Memory
Locality: A closer look, Read/Write case bool x = true; while (x) { x = false; // Do other // work… } bool x = true; while (x) { x = false; // Do other // work… } • Notes: • x becomes a bottleneck, the valid copy keeps jumping from one cache to the other • Every write access causing invalidation • Almost every read causing a read miss and a bus read Processor # 1 Processor # 2 1. Cache miss 4. Write x x 2. Read request 3. Data 5. Invalidate block containing x x Memory
x,y Locality: A closer look, Effect of Cache Line Length bool x = true; while (x) { x = false; // Do other // work… } bool y = true; while (y) { y = false; // Do other // work… } • Notes: • x & y have different addresses but fall into the same cache line (block)! Processor # 1 Processor # 2 x,y x 0x0 y 0x4 Memory
Locality: A closer look, Effect of Cache Line Length bool x = true; while (x) { x = false; // Do other // work… } bool y = true; while (y) { y = false; // Do other // work… } • Notes: • Read doesn’t cause any problem Processor # 1 Processor # 2 x,y x,y x 0x0 y 0x4 Memory
Locality: A closer look, Effect of Cache Line Length bool x = true; while (x) { x = false; // Do other // work… } bool y = true; while (y) { y = false; // Do other // work… } • Notes: • Remember: Invalidations are per cache-line/block not word! • So we have pretty much the same behavior as the read/write case on a single variable Processor # 1 Processor # 2 x,y x,y Invalidate block containing x & y x 0x0 y 0x4 Memory
Requirements for Locality • Spatial and temporal locality • Minimizing read/write and write sharing • Minimize false sharing • Minimize the distance between the accessing processor and the target memory module.
Design Basics for Tornado • Individual resources are individual objects • Clustering objects • Protected procedure calls (PPC) • Semi-automatic garbage collection
Clustered Objects • Appears as a single object from the outside but is internally split into reps • Each rep handles requests from one or more processors • Lots of advantages to this design
Clustered Objects (cont.) • Per-processor translation tables • Partitioned global translation table • Default “miss” handlers
Protected Procedure Calls • Microkernel: relies on servers to carry on part of the OS job • As many server threads as there are clients • A request is handled on the same processor where it was issued *Image source: Wikipedia
Garbage Collection • Semi-automatic • Makes distinction between temporary and persistent references to objects • Eliminates the need for two locks to guarantee existence and locking altogether for read only data
Conclusion • Tornado performs much better than many commercial OSes • The concept of clustered objects gives it a lot of advantage • High locality of data • Diminished need for locking • Higher degree of sharing, concurrency and modularity