Distributed Systems Major Design Issues

Distributed SystemsMajor Design Issues DM Rasanjalee Himali CSc8320 – Advanced Operating Systems (SECTION 2.6) FALL 2009

Section I The Basics

Introduction • A distributed system consist of concurrent processes accessing distributed resources • Resources are shared through message passing in a network environment that may be unreliable and contain untrusted components.

Major Design Issues • Object Models and Naming Schemes • Distributed Coordination • Interprocess Communication • Distributed Resources • Fault Tolerance and Security

Object Models and Naming Schemes • Objects in Computer System: • Ex: • Processes, data files, memory, devices, processors, networks • Are represented by set of allowable operations of the object • Physical details of the object are transparent to other objects • Object Servers: • Is the process that manages the object • Objects are encapsulated in servers • Only visible entities in the system are servers • Ex: • process servers, file servers, memory servers etc. • A client is a null server that accesses the object server

Object Models and Naming Schemes • Identifying Server: • To contact a server, server must be identifiable. • Three identification methods: • Identification by name • Identification by physical or logical address • Identification by service that servers provide

Object Models and Naming Schemes • Identification by Name: • Names are generally assumed to be unique • But multiple addresses for same server may exist , and needs to change if server moves • Names are more intuitive than addresses • Identification by physical or logical address • Name to logical address mapping is done by name server in OS. • logical address to physical address mapping is a network service • The PORT used by many systems is a logical address. • Associating more than one port to server provide multiple entry points to server • Identification by service that servers provide • Multiple servers can share the same port • This can be used for service identification in distributed system. • Client is only interested in requested service • Who provide the service is irrelevant • Multiple servers can provide the same service • This approach is critical to implement an autonomous system. • A resolution protocol is needed to translate service to server

Object Models and Naming Schemes • Object models and naming : • Must be addressed early in the system design as many things depend on the naming scheme: • Ex: • Structure of the system • Management of the namespace • Name resolution • Access methods

Distributed Coordination • Interacting concurrent processes require coordination to achieve synchronization. • Types of Synchronization Requirements: • In general there are three types of synchronization requirements: • Barrier Synchronization • A set of processes or events must reach a common synchronization point before they can continue • Condition coordination • A process or event must wait for a condition that will be set asynchronously by other interacting processes to maintain some ordering of execution • Mutual Exclusion • Concurrent processes must have mutual exclusion when accessing a critical shared resource

Distributed Coordination • Synchronization Implies the need for the knowledge of state information about other processes • Problems with Synchronization: • Complete State of information is difficult to obtain • Ex: • no shared memory environment • Solution: • Use message passing to convey state information • Inaccurate or Incomplete information • Ex: • message transfer delays • Solution: • Use centralized coordinator that move from one process to another (no single point of failure)

Distributed Coordination • Deadlock of Processes • Interacting processes can lead to deadlock • Deadlock :Circular waiting of processors • Problem: • Sometimes it is not practical to implement deadlock prevention or avoidance strategies in a distributed system • Solution: • Detect and recover from deadlocks • Problem: • Detection of deadlocks in a distributed system is non-trivial (b’s global state of the system is not available) • Who should initiate the detection algorithm? • How the algorithm be implemented in distributed fashion by message passing? • Who should be the victim in order to abort and resolve the deadlock? • How the victim can be recovered? • Efficiency of the of deadlock resolution and recovery seems more than that of detection

2. Distributed Coordination • Distributed solutions to synchronization and deadlock problems: • Use partial global state for decision making • Many applications do not need absolute global knowledge of the system • Exchange of local knowledge among cooperating sites

3. Interprocessor Communication • Communication: • Most important issue in any distributed system • In OSs, interaction between processes and information flow between objects depend on communication • Message passing is the only means of communication in distributed system • Goal: • Have transparency in communication by providing higher level communication methods that hide the physical details of the message passing • Two concepts are used to achieve this goal: • Client/Server model • Remote Procedure Calls (RPC)

3. Interprocessor Communication • Client/Server model: • Programming paradigm for structuring processing in distributed systems • All system interactions are viewed as a pair of message exchanges • Client process send request to server • Server responds with a reply message • Remote Procedure Calls: • Client/Server request/reply message exchange is represented as a procedure call in programming languages • RPC: Procedure call to a remote server

3. Interprocessor Communication • Multicast and Broadcast: • Client/Server, RPC : Unicast (point-to-point) • Notion of “groups” is inherent to distributed systems • Processes cooperate in group activities • Group communication in distributed systems is logical multicast (perhaps without broadcasting hardware) • Communication needs to go through several layers of protocols and be propagated to a no. of physically distributed nodes. • Thus it is more susceptible to failures in the system • Reliable and atomic group broadcast remains an open issue in distributed systems

4. Distributed Resources • Only resource needed for computation are data and processing • Data: • may reside physically in distributed memory or secondary storage • Processing Capacity: • Aggregate processing power of all processors • Goal: • Achieve transparency in allocating processing capacity processes (distributing processes/load to the processors )

4. Distributed Resources • Static Load Distribution: • Also called multiprocessor scheduling • Goal: • minimize completion time of a set of related processes • Issue: • Communication overhead on design of scheduling strategies • Dynamic Load Distribution: • Also called load sharing • Goal: • Maximize utilization of set of processes • Issue: • Process migration

4. Distributed Resources • Distributed Shared Memory: • Transparent memory system • Assume data resides in distributed memory modules • Present single shared memory view of physically distributed memories • Goal: • Maximize transparency • Other issues (for distributed file systems & distributed shared memory): • Sharing & replication of data • Need protocols to maintain consistency & coherency of data • Existence of replicas should be transparent to the user

5. Fault Tolerance & Security • Distributed systems are vulnerable to failures and security threats • Failures: • Faults due to unintentional intrusion • Security Violations: • Faults due to intentional intrusion • Dependable Distributed System: • Fault tolerant system • System faults are transparent to the user

5. Fault Tolerance & Security • Solution for Failures: • Redundancy in the system: • Is an inherent property of distributed systems as data and resources can be replicated • Rollback: • Recovery from failures requires rolling back the execution of failed process and other affected processes • The execution state must be kept for rollback recovery (difficult task in distributed systems) • Solution for Security: • Issues : • Trustworthiness of the communicating processes • Confidentiality and integrity of messages & data • Authentication & Authorization • Solutions: • Authentication : Clients , servers & messages must be authenticated • Authorization : access control across physical network with heterogeneous components under different administrative units, using different security models

Section II Related Work

Related Work • Peer-to-Peer Networks: • distributed network architecture • composed of participants that make a portion of their resources available directly to their peers without intermediary network hosts or servers. • Peers are both suppliers and consumers of resources • Research: • Security and privacy in P2P systems • Resource discovery/management in P2P systems

Related Work • Peer-to-Peer Search BFS – Breadth First Search (-) sacrifices performance and network utilization for simplicity (+) guarantees high hit rates at the expense of a large no. of messages Random BFS (-) RBFS algorithm is probabilistic and the query might not reach some large network segments (+) does not require global knowledge

Section III Future Work

Future Work • Develop a model for P2P Search • Bayesian Inferencing • Value of Information • Extend P2P search for P2P Web Search • Most centralized Web search engines currently find it harder to catch up with the growth in information needs • Local & decentralized global directory • Semantic P2P Overlay Networks • Node connections be influenced by content / existence of multiple overlay networks based on content • Dynamic restructuring of overlay

References • Randy Chow, Theodore Johnson, “Distributed Operating Systems & Algorithms”, Addison Wesley, 1997 • Semantic Overlay Networks for P2P Systems, Arturo Crespo and Hector Garcia-Molina, 2002 • Random walks in peer-to-peer networks: algorithms and evaluation , Christos Gkantsidis, Milena Mihail, Amin Saberi , 2006 • www.en.wikipedia.com

Distributed Systems Major Design Issues