750 likes | 934 Views
Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System. Ben Gamsa , Orran Krieger, Jonathan Appavoo , Michael Stumm. By : Priya Limaye. Locality. What is Locality of reference? . Locality. What is Locality of reference?. sum = 0;
E N D
Tornado: Maximizing Locality and Concurrencyin a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm By : Priya Limaye
Locality • What is Locality of reference?
Locality • What is Locality of reference? sum = 0; for (int i = 0; i < 10; i ++) { sum = sum + number[i]; }
Locality • What is Locality of reference? Temporal Locality Recently accessed data and instruction are likely to be accessed in near future sum = 0; for (int i = 0; i < 10; i ++) { sum = sum + number[i]; }
Locality • What is Locality of reference? Spatial Locality Data and instructions close to recently accessed data and instructions are likely to be accessed in the near future. sum = 0; for (int i = 0; i < 10; i ++) { sum = sum + number[i]; }
Locality • What is Locality of reference? • Recently accessed data and instructions and nearby data and instructions are likely to be accessed in the near future. • Grab a larger chunk than you immediately need • Once you’ve grabbed a chunk, keep it
Locality in multiprocessor • Computation depends on data local to processor • Each processor uses data from its own cache • Once data is brought in cache it stays there
Locality in multiprocessor CPU CPU Cache Cache Memory Counter
Counter: Shared CPU CPU Memory 0
Counter: Shared CPU CPU 0 Memory 0
Counter: Shared CPU CPU 1 Memory 1
Counter: Shared Read : OK CPU CPU 1 1 Memory 1
Counter: Shared Invalidate CPU CPU 2 Memory 2
Comparing counter Scales well with old architecture Performs worse with shared memory multiprocessor
Counter: Array • Sharing requires moving back and forth between CPU Caches • Split counter into array • Each CPU get its own counter
Counter: Array CPU CPU Memory 0 0
Counter: Array CPU CPU 1 Memory 1 0
Counter: Array CPU CPU 1 1 Memory 1 1
Counter: Array Read Counter CPU 2 CPU CPU 1 1 Add All Counters (1 + 1) Memory 1 1
Counter: Array • This solves the problem • What about performance?
Comparing counter Does not perform better than ‘shared counter’.
Counter: Array • This solves the problem • What about performance? • What about false sharing?
Counter: False Sharing CPU CPU Memory 0,0
Counter: False Sharing CPU CPU 0,0 Memory 0,0
Counter: False Sharing Sharing CPU CPU 0,0 0,0 Memory 0,0
Counter: False Sharing Invalidate CPU CPU 1,0 Memory 1,0
Counter: False Sharing Sharing CPU CPU 1,0 1,0 Memory 1,0
Counter: False Sharing Invalidate CPU CPU 1,1 Memory 1,1
Solution? • Use padded array • Different elements map to different locations
Counter: Padded Array CPU CPU Memory 0 0
Counter: Padded Array Update independent of each other CPU CPU 1 1 Memory 1 1
Comparing counter Works better
Locality in OS • Serious performance impact • Difficult to retrofit • Tornado • Ground up design • Object Oriented approach – Natural locality
Tornado • Object Oriented Approach • Clustered Objects • Protected Procedure Call • Semi-automatic garbage collection • Simplified locking protocol
Object Oriented Approach Process 1 Process 2 … Process Table
Object Oriented Approach Process 1 Lock Process 2 Process 1 … Process Table
Object Oriented Approach Process 1 Lock Process 2 Process 1 … Process Table Process 2
Object Oriented Approach Process 1 Lock Process 2 Process 1 … Lock Process Table Process 2
Object Oriented Approach Class ProcessTableEntry{ data lock code }
Object Oriented Approach • Each resource is represented by different object • Requests to virtual resources handled independently • No shared data structure access • No shared locks
Object Oriented Approach Process Page Fault Exception
Object Oriented Approach Region Process Page Fault Exception Region
Object Oriented Approach Region FCM Process Page Fault Exception Region FCM FCM File Cache Manager
Object Oriented Approach Search for responsible region HAT Region FCM Process Page Fault Exception Region FCM HAT Hardware Address Translation FCM File Cache Manager
Object Oriented Approach COR Region FCM Process DRAM Page Fault Exception Region FCM COR FCM File Cache Manager COR Cached Object Representative DRAM Memory manager
Object Oriented Approach • Multiple implementations for system objects • Dynamically change the objects used for resource • Provides foundation for other Tornado features
Clustered Objects • Improve locality for widely shared objects • Appears as single object • Composed of multiple component objects • Has representative ‘rep’ for processors • Defines degree of clustering • Common clustered object reference for client
Clustered Objects : Implementation • A translation table per processor • Located at same virtual address • Pointer to rep • Clustered object reference is just a pointer into the table • ‘reps’ created on demand when first accessed • Special global miss handling object