100 likes | 114 Views
This lecture discusses the concepts of distributed systems and middleware, with a focus on achieving transparency in access, location, concurrency, replication, failure, performance, scaling, and coordination in communication. It also covers abstractions, such as processes, threads, communication channels, and global state, and explores the challenges of process coordination in the presence of errors. Real-life communication in distributed systems is examined, including message loss, duplication, and error handling techniques.
E N D
COT 5611 Operating SystemsDesign Principles Spring 2012 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 5:00-6:00 PM
Lecture 5 – Wednesday January 25 • Reading assignment: the class notes “Distributed systems-basic concepts” available online. • Last time • Names and fundamental abstractions • Parallel systems • The speedup • Parallelism • Bit-level parallelism • Instruction-level parallelism • Data parallelism • Task-parallelism • Parallel architectures: SISD, SIMD, MIMD • Storage systems • Atomicity • Why it is hard to implement atomicity Lecture 5
Today • Distributed systems • Middleware: • Transparency (Access, Location, Concurrency, Replication, Failure, Performance, Scaling). • Motivating example molecular dynamics computation • Communication in distributed systems • Abstractions: process, thread, communication channel • The state of a process • Events: internal, communication • Process history • Space time diagrams • Global state Lecture 5
Desirable properties of middleware Access transparency - local and remote information objects are accessed using identical operations; Location transparency -information objects are accessed without knowledge of their location; Concurrency transparency - several processes run concurrently using shared information objects without interference among them; Replication transparency - multiple instances of information objects are used to increase reliability without the knowledge of users or applications; Failure transparency - the concealment of faults; Migration transparency - the information objects in the system are moved without affecting the operation performed on them; Performance transparency - the system can be reconfigured based on the load and quality of service requirements; Scaling transparency - the system and the applications can scale without a change in the system structure and without affecting the applications. Lecture 5
Abstractions Process a program in execution Thread a light-weight process. A thread of execution is the smallest unit of processing that can be scheduled by an operating system. The state of a process the ensemble of information about the process we need to restart it after it has been suspended. Communication channel provides the means for processes or threads to communicate with one another and coordinate their actions by exchanging messages. Message a structured unit of information, which can be interpreted only in a semantic context by the sender and the receiver. Communication among processes is done only by means of send(m) and receive(m)communication events where m is a message. State of a communication channel given two processes pi and pk the state of the channel, C(i,k), from pi to pkconsists of messages sent by pi but not yet received by pk Lecture 5
More abstractions • Local event event internal to the process • Communication event send(m), recive(m) • Local history of a process a sequence of events, possibly an infinite one; can be presented graphically as a space-time diagram where events are ordered by their time of occurrence. • Distributed system multiple processes active at any one time and communicating with each other. • Global state of a distributed system the union of the states of the individual processes and channels. • The state of the channels does not appear explicitly in this definition of the global state because the state of the channels is encoded as part of the local state of the processes communicating through the channels. • Protocol a finite set of messages exchanged among processes to help them coordinate their actions Lecture 5
Space-time diagrams Lecture 5
The lattice of global states Lecture 5
Process coordination in the presence of errors A critical problem for a distributed system. Statement: given two processes connected by a communication channel that can lose a message with probability 𝛆 > 0, no protocol capable of guaranteeing that two processes will reach agreement exists, regardless of how small the probability 𝛆 is. Lecture 5
Communication in the real-life distributed systems • Messages can be • Lost • Duplicated • Affected by errors • In practice • A messages has a unique name sequence number • To confirm reception of a message the receiver sends an acknowledgment • Each acknowledgment has a sequence number as well as the sequence number of the message it acknowledges/. • The sender sets up a timeout for the receipt of an acknowledgment • To deal with lost messages the sender retransmits the message if the timeout expires • To deal with duplicated messages the receiver discharges messages with duplicate sequence number. • To deal with errors error detection and possibly error correction Lecture 5