1 / 40

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System

This paper explores locality and its importance in modern multiprocessor operating systems. It discusses various approaches to achieving locality, such as using arrays of counters and padded elements, in order to optimize memory latency, cache lines, and system sizes. The paper also introduces Tornado, a new operating system that makes locality its primary design goal through object-oriented design, clustered objects, and a new locking strategy.

karennjones
Download Presentation

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System Presentedby: Holly Grimes By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm

  2. Locality • In uniprocessors • Spatial Locality—access neighboring memory locations • Temporal Locality—access memory locations that where accessed recently • In multiprocessors, locality involves each processor using data from its own cache

  3. Why is Locality Important? • Modern multiprocessors exhibit • Higher memory latency • Large write-sharing costs • Large cache lines (false sharing) • Larger system sizes • Large secondary caches • NUMA effects

  4. Why is Locality Important?

  5. Why is Locality Important?

  6. Why is Locality Important?

  7. Why is Locality Important?

  8. Why is Locality Important?

  9. Why is Locality Important?

  10. Why is Locality Important? • Sharing the counter requires moving it back and forth between the CPU caches • Solution?? • Split the counter into an array of integers • Each CPU gets its own counter

  11. Achieving Locality

  12. Achieving Locality

  13. Achieving Locality

  14. Achieving Locality • Using an array of counters seems to solve our problem… • But what happens if both array elements map to the same cache line? • False sharing • Solution: • Pad each array element • Different elements map to different cache lines

  15. Comparing Counter Implementations

  16. Why is Locality Important? • Modern multiprocessors exhibit • Higher memory latency • Large write-sharing costs • Large cache lines (false sharing) • Larger system sizes • Large secondary caches • NUMA effects

  17. NUMA Effects

  18. NUMA Effects

  19. Achieving Locality in an OS • We’ve seen several ways to achieve locality in the implementation of a counter • Now we extend these concepts to see how locality can be achieved in an OS • Tornado’s Approach – • Built a new OS from the ground up • Make locality the primary design goal

  20. Tornado’s Approach User Application OS Implementation Locality Locality Independence Independence Illustration by Philip Howard

  21. Tornado’s Locality-Maximizing features • Object-oriented Design • Clustered Objects • A New Locking Strategy • Protected Procedure Calls

  22. Object-oriented Design

  23. Each OS resource is represented as a separate object All locks and data structures are internal to the objects localizes and encapsulates the locks and data This structure allows different resources to be managed without accessing shared data structures without acquiring shared locks Simplifies OS implementation modular design Object-oriented Structure

  24. Example: Memory Management Objects HAT COR Region FCM Process DRAM Region FCM COR HAT Hardware Address Translation FCM File Cache Manager COR Cached Object Representative DRAM Memory manager Illustration by Philip Howard

  25. Object-oriented Design • For each resource, this design provides one object to be shared by all CPUs • Comparable to having one copy of the counter shared among all CPUs • To maximize locality, something more needs to be done

  26. Clustered Objects

  27. Clustered objects are composed of a set of representative objects There can be one rep for the system, one rep per processor, or one rep for a cluster of processors Clients access a clustered object using a common reference to the object Each call to an object using this reference is automatically directed to the appropriate rep Clients do not need to know anything about the location or organization of the reps to use a clustered object Impact on locality is similar to the effect of padded arrays on the counter Clustered Objects

  28. Clustered Objects

  29. Keeping Clustered Objects Consistent • When a clustered object has multiple reps, there must be a way of keeping the reps consistent • Coordination between reps can happen via • Shared Memory • Protected Procedure Calls

  30. Clustered Object Implementation • Each processor has a translation table • The table is located at the same virtual address in every processor • For each clustered object, the table contains a pointer to the rep that serves the given processor • A clustered object reference is just a pointer into this table • Reps for a clustered object are created dynamically when they are first accessed • Dynamic rep creation is dealt with by the global miss handler

  31. Clustered Object Implementation

  32. A New Locking Strategy

  33. Synchronization • Two kinds of synchronization issues must be dealt with • Using locks to protect data structures • Ensuring the existence of needed data structures

  34. Locks • Tornado uses spin-then-block locks to minimize the overhead of the lock/unlock instructions • Tornado limits lock contention by • Encapsulating the locks in objects to limit the scope of locks • Using clustered objects to provide multiple copies of a lock

  35. Existence Guarantees • Traditional Approach: use locks to protect object references • Prevents races where one process destroys an object that another process is using • Tornado’s Approach: semi-automatic garbage collection • Garbage collection destroys a clustered object only when there are no more references to the object • When clients use an existing reference to access a clustered object, they have a guarantee that the referenced object still exists • References can be safely accessed without the use of (global) locks

  36. Protected Procedure Calls

  37. Interprocess Communication • Tornado uses Protected Procedure Calls (PPCs) to bring locality and concurrency to interprocess communication • A PPC is a call from a client object to a server object • Acts like a clustered object call that passes between the protection domains of the client and server processes • Advantages of PPC • Client requests are always serviced on their local processor • Clients and servers share the CPU in a manner similar to handoff scheduling • The server has one thread of control for each client request

  38. Protected Procedure Calls

  39. Performance

  40. Conclusions • Tornado’s increased locality and concurrency has produced a scalable OS design for multiprocessors • This locality was provided by several key system features • An object-oriented design • Clustered objects • A new locking strategy • Protected procedure calls

More Related