190 likes | 281 Views
Fast-OS. Shrinking AIX as a compute node OS. July-10-2002 Terry Jones, Integrated Computing & Communications Dept trj@llnl.gov. Outline. Introduction Today’s landscape Directions Problem Areas Ripe for Investigation Parallel Aware Scaling Parallel Aware Memory Management
E N D
Fast-OS Shrinking AIX as a compute node OS July-10-2002 Terry Jones, Integrated Computing & Communications Dept trj@llnl.gov
Outline • Introduction • Today’s landscape • Directions • Problem Areas Ripe for Investigation • Parallel Aware Scaling • Parallel Aware Memory Management • Metrics for evaluating system software • Why would anyone want to muck with AIX • Bottom-up and Top-down Approaches • Why AIX? • How AIX? • Conclusion
Introduction • Today’s landscape • Directions • Problem Areas Ripe for Investigation • Parallel Aware Scaling • Parallel Aware Memory Management • Metrics for evaluating system software • Why would anyone want to muck with AIX • Bottom-up and Top-down Approaches • Why AIX? • How AIX? • Conclusion
The Landscape • Parallel applications need to span thousands of nodes • Architectures are adding more processor state • Applications are not “mission critical” • Both interrupts and busy-waiting are bad • Cache effects (processor affinity) cannot be ignored • Two modes: Capability mode (jobs are dedicated) Capacity mode (jobs may space-share machine)
Directions • Continue to move from a monolithic operating system which communicates via shared-memory TO a decentralized design which communicates via efficient messages • Small kernel & process level managers • Modularity • Fault-tolerance • Extensibility Question: How much should system software offer in terms of features? Answer: Everything required, and as much desired as possible
Introduction • Today’s landscape • Directions • Problem Areas Ripe for Investigation • Parallel Aware Scaling • Parallel Aware Memory Management • Metrics for evaluating system software • Why would anyone want to muck with AIX • Bottom-up and Top-down Approaches • Why AIX? • How AIX? • Conclusion
Problem Areas Ripe For Investigation • Add “parallel awareness” • CPU resource (local/global program context, scheduling) • Memory resource (demand paging, address space extent) • Metrics • Other possibilities: Fault tolerance/Membership services • Re-visit where we insert boundaries (e.g. boundary between kernel and user-level code)
Scheduling Is An Overloaded Word • Spatial Scheduling • Assign processes to nodes • For example, batch schedulers & gang-schedulers • Coarse grain view of work to be done • Temporal Scheduling • For example, native operating system scheduling • Fine grain view of work to be done (e.g. efficient pthread level scheduling) • Lack necessary global view • Coscheduling
The Need for Parallel Aware Scheduling • Even on the most bare-bones operating systems, there can be more runnable processes than processors • Many parallel algorithms are extremely sensitive to serializations • A first order goal is to maximize the overlap of competing (interfering) processes during a parallel application.
Improving Memory Management • Provide as much “memory” as possible with as little pain as possible • Memory systems are becoming more complex • Improved mechanisms to counter false-sharing.
Why Demand Paging • External storage (secondary & networked) will continue to exceed local memory • Memory requirements for certain simulations are almost unbounded • Removing constraints on memory is very desirable, but the cost of a page-fault is too much to have hidden from an application • Default process level manager provide page-cache management as in Stanford DASH.
Challenges For A Virtual Memory Environment • Thought to preclude or make more difficult OS bypass communications • An application cannot know the amount of physical memory it has available • An application cannot efficiently control the contents of the physical memory allocated to it • An application cannot control the read-ahead, writeback and discarding of pages within it’s physical memory.
Metrics For Evaluating System Software • An aid for reaching agreement on what we want • A quantitative measure of different approaches • Compared to the scheduler work and the virtual memory work, may be the most difficult
Introduction • Today’s landscape • Directions • Problem Areas Ripe for Investigation • Parallel Aware Scaling • Parallel Aware Memory Management • Metrics for evaluating system software • Why would anyone want to muck with AIX • Bottom-up and Top-down Approaches • Why AIX? • How AIX? • Conclusion
Bottom-up & Top-down Approaches • Bottom-up • Start with a clean-slate • Add features as the need arises • Settle on a reasonable boundary • Top-down • Start with a full-featured implementation • Remove the unnecessary cruft • Settle on a reasonable boundary
Why AIX? • AIX is ubiquitous in supercomputer centers • AIX already has extensive capabilities • Not required to build everything before we try anything • AIX is mature (read: is not in radical change mode) • AIX scalability (32-way with AIX 5.x)
How AIX? • In close conjunction with IBM • Expect successes to payoff in IBM products • Done in an operating system independent manner • Findings apropos and available to other operating systems • Evaluated with real applications on very large machines
Introduction • Today’s landscape • Directions • Problem Areas Ripe for Investigation • Parallel Aware Scaling • Parallel Aware Memory Management • Metrics for evaluating system software • Why would anyone want to muck with AIX • Bottom-up and Top-down Approaches • Why AIX? • How AIX? • Conclusion
Conclusion • New needs arising from today’s parallel machines pose new challenges for system software • Among the key needs which emerge... • Parallel aware scheduling • Improved memory management • Metrics for evaluating operating systems • These can be investigated from a bottom-up approach, or a top-down approach, or both • AIX is a reasonable choice for a top-down approach This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48.