90 likes | 204 Views
Nomad: A Scalable Operating System for Clusters of Uni and Multiprocessors. Eduardo Pinheiro Ricardo Bianchini Rutgers University. Goals. Scalability No centralization. No dedicated nodes to tasks. Ease of use Single system image. Backward compatible.
E N D
Nomad: A Scalable Operating System for Clusters of Uni and Multiprocessors Eduardo Pinheiro Ricardo Bianchini Rutgers University
Goals • Scalability • No centralization. • No dedicated nodes to tasks. • Ease of use • Single system image. Backward compatible. • Efficient and automatic management of all resources • CPU, Memories, I/O devices. • Fault tolerant • Resistant to individual node crashes. Redundant.
Applications Nomad Daemon Base Operating System Overview
Mechanisms • Single system image • Unique process identifiers across cluster. • Signal delivery is independent of process location. • Process creation automatically picks best node. • Efficient resource utilization • Load balancing by migration due to resource (CPU, memory or I/O bandwidth) exhaustion. • Implicit co-scheduling across nodes. • Co-scheduling on multiprocessors.
Mechanisms • Scalability • No centralization. Nodes are autonomous. • High throughput striped and randomized file system (software RAID). • Need for extra intra-cluster communication obviated by piggybacked load dissemination information via file system messages.
Mechanisms • Fault Tolerance • Periodic checkpoints to stable storage. • If faults occur, applications can be restarted with minimum losses. • Redundant file system is capable of operating with up to one faulty disk/node. Recovery happens online.
Results • Simulated Results for Load Balancing Obs: Overdemand time is due to sum of CPU, memory and I/O demands and is expressed in seconds.
Publications & Future Work • Eduardo Pinheiro and Ricardo Bianchini, “Nomad: An Efficient Operating System for Clusters of Uni and Multiprocessors", In Proceedings of the 1st IEEE Computer Society International Workshop on Cluster Computing (IWCC'99), Melbourne, Australia, December 1999. • Eduardo Pinheiro "Nomad, a Scalable Operating System for Clusters of Uni and Multiprocessors", XIII Dissertation Thesis Contest (CTD2000), Curitiba, PR, Brazil, July 16-21. Best MSc thesis of 1999. Future Work: • Explore the use user-level protocols (VIA) for communication between daemons. • Explore the use of remote memory writes in more aggressively managing the cluster resources. Award Winning