1 / 9

Nomad: A Scalable Operating System for Clusters of Uni and Multiprocessors

Nomad: A Scalable Operating System for Clusters of Uni and Multiprocessors. Eduardo Pinheiro Ricardo Bianchini Rutgers University. Goals. Scalability No centralization. No dedicated nodes to tasks. Ease of use Single system image. Backward compatible.

peony
Download Presentation

Nomad: A Scalable Operating System for Clusters of Uni and Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nomad: A Scalable Operating System for Clusters of Uni and Multiprocessors Eduardo Pinheiro Ricardo Bianchini Rutgers University

  2. Goals • Scalability • No centralization. • No dedicated nodes to tasks. • Ease of use • Single system image. Backward compatible. • Efficient and automatic management of all resources • CPU, Memories, I/O devices. • Fault tolerant • Resistant to individual node crashes. Redundant.

  3. Applications Nomad Daemon Base Operating System Overview

  4. Mechanisms • Single system image • Unique process identifiers across cluster. • Signal delivery is independent of process location. • Process creation automatically picks best node. • Efficient resource utilization • Load balancing by migration due to resource (CPU, memory or I/O bandwidth) exhaustion. • Implicit co-scheduling across nodes. • Co-scheduling on multiprocessors.

  5. Mechanisms • Scalability • No centralization. Nodes are autonomous. • High throughput striped and randomized file system (software RAID). • Need for extra intra-cluster communication obviated by piggybacked load dissemination information via file system messages.

  6. Mechanisms • Fault Tolerance • Periodic checkpoints to stable storage. • If faults occur, applications can be restarted with minimum losses. • Redundant file system is capable of operating with up to one faulty disk/node. Recovery happens online.

  7. Results

  8. Results • Simulated Results for Load Balancing Obs: Overdemand time is due to sum of CPU, memory and I/O demands and is expressed in seconds.

  9. Publications & Future Work • Eduardo Pinheiro and Ricardo Bianchini, “Nomad: An Efficient Operating System for Clusters of Uni and Multiprocessors", In Proceedings of the 1st IEEE Computer Society International Workshop on Cluster Computing (IWCC'99), Melbourne, Australia, December 1999. • Eduardo Pinheiro "Nomad, a Scalable Operating System for Clusters of Uni and Multiprocessors", XIII Dissertation Thesis Contest (CTD2000), Curitiba, PR, Brazil, July 16-21. Best MSc thesis of 1999. Future Work: • Explore the use user-level protocols (VIA) for communication between daemons. • Explore the use of remote memory writes in more aggressively managing the cluster resources. Award Winning

More Related