340 likes | 528 Views
CS 603 Mid-Semester Review. March 4, 2002. One or Two Day Review?. One day: Skim material and Test Overview What to do with Wednesday? More on replication Start on distributed processes Two day: Discuss material to date Wednesday: Finish Review Work out sample question. Basics.
E N D
CS 603Mid-Semester Review March 4, 2002
One or Two Day Review? • One day: Skim material and Test Overview • What to do with Wednesday? • More on replication • Start on distributed processes • Two day: Discuss material to date • Wednesday: • Finish Review • Work out sample question
Basics • Why do we want distributed systems? • Scaling • Heterogeneity • Geographic Distribution • What is a distributed system? • Transparency vs. Exposing Distribution • Hardware Basics • Communication Mechanisms
Basic Software Concepts • Hiding vs. Exposing • Distribution – Distributed OS • Location, but not distribution – Middleware • None – Network OS • Concurrency Primitives • Semaphores • Monitors • Distributed System Models • Client-Server • Multi-Tier • Peer to Peer
Communication Mechanisms • Shared Memory • Enforcement of single-system view • Delayed consistency: δ-Common Storage • Message Passing • Reliability and its limits • Stream-oriented Communications • Remote Procedure Call • Remote Method Invocation
RPC Example: DCE • Language / Platform Independent • Implementation Issues: • Data Conversion • Underlying Mechanisms • Fault Tolerance Approaches
Java RMI • Supports remote invocation of Java objects • Key: Java Object SerializationStream objects over the wire • Language specific • Advantages • True object-orientation: Objects as arguments and values • Mobile behavior: Returned objects can execute on caller • Integrated security • Built-in concurrency (through Java threads) • Disadvantage – Java only • Implementation / Use • Registry
SOAP • Goal: RPC protocol that works over wide area networks • Interoperable • Language independent • Problem: Firewalls • Solution: HTTP/XML • Client side: Ability to generate http calls and listen for response • Server: • Listen for HTTP • Bind to procedure • Respond with HTTP • SOAP message format and use mechanisms
Naming Requirements • Disambiguate only • Access resource given the name • Build a name to find a resource • Do humans need to use name? • Static/Dynamic Resource • Performance Requirements
Naming Approaches • Scope • Global vs. Hierarchical • Unique ID vs. Non-Unique Description • Namespaces • URN, URI, URL • Registries
Registry Example: X.500 • Goal: Global “white pages” • Lookup anyone, anywhere • Developed by Telecommunications Industry • ISO standard directory for OSI networks • Idea: Distributed Directory • Application uses Directory User Agent to access a Directory Access Point
Directory Information Base(X.501) • Tree structure • Root is entire directory • Levels are “groups” • Country • Organization • Individual • Entry structure • Unique name • Build from tree • Attributes: Type/value pairs • Schema enforces type rules • Alias entries
X.500 • Directory Entry: • Organization level – CN=Purdue University, L=West Lafayette • Person level – CN=Chris Clifton, SN=Clifton, TITLE=Associate Professor • Directory Operations • Query, Modify • Authorization / Access control • To directory • Directory as mechanism to implement for others
X.500 – Distributed Directory • Directory System Agent • Referrals • Replication • Cache vs. Shadow copy • Access control • Modifications at Master only • Consistency • Each entry must be internally consistent • DSA giving copy must identify as copy
X.500 Subsets • LDAP • X.500 without OSI • Intended for use over IP • Active Directory • Microsoft’s answer to LDAP • Extensible “default” naming schema • Limited replication facilities
Clock Synchronization • Definition: All nodes agree on time • What do we mean by time? • What do we mean by agree? • Lamport Definition: Events • Events partially ordered • Clock “counts” the order
Event-based definition(Lamport ’78) Define partial order of processes • A B: A “happened before” B: Smallest relation such that: • If A and B in same process and A occurs first, A B • If A is sending a message and B is receipt of a message, A B • If A B and B C, then A C • Clock: C(x) is time x occurs: • C(x) = Ci(x) where x running on node i. • Clocks correct if a,b: ab C(a) < C(b)
Lamport Clock Implementation • Node i Increments Ci between any two successive events • If event a is sending of a message m from i to j, • m contains timestamp Tm = Ci(a) • Upon receiving m, set Cj≥ current Cj and > Tm • Can now define total ordering. a b iff: • Ci(a) < Cj(b) • Ci(a) = Cj(b) and Pi < Pj
What if we want “wall clock” time? • Ci must run at correct rate: • κ << 1 such that | dCi(t)/dt – 1 | < κ • Synchronized: • small ε such that i,j: | Ci(t) – Cj(t) | < ε • Assume transmission time between μ and μ+ξ • Algorithm: Upon receiving message m,set Cj(t) = max(Cj(t), Tm+μ) • Theorem: Assume every τ seconds a message with unpredictable delay ξ is sent over every arc. Then t ≥ t0 + τd, ε≈ d(2κτ + ξ)
Clock Synchronization:Limits • Best Possible: Delay Uncertainty • Actually ε(1 – 1/n) • Synchronization with Faults • Faulty clock • Communication Failure • Malicious processor • Worst case: Can only synchronize if < 1/3 processors faulty • Better if clocks can be authenticated
Real example: NTP I doubt you need to review this...
Process Synchronization • Problem: Shared Resources • Model as sequential or parallel process • Assumes global state! • Alternative: Mutual Exclusion when Needed • Coordinator approach • Token Passing • Timestamp
Mutual Exclusion • Requirements • Does it guarantee mutual exclusion? • Does it prevent starvation? • Is it fair? • Does it scale? • Does it handle failures?
CS 603Mid-Semester Review March 6, 2002
Mutual Exclusion:Colored Ticket Algorithm • Goals: • Decentralized • Fair • Fault tolerant • Space Efficient • Idea: Numbered Tickets • Next number gets resource • Problem: Unbounded Space • Solution: Reissue blocks
Multi-ResourceMutual Exclusion • New Problem: Deadlock • Processes using all resources • Each needs additional resource to proceed • Dining Philosophers Problem • Coordinated vs. truly distributed solutions • Problems with deterministic solutions • Probabilistic solution – Lehman & Rabin • Starvation / fairness properties
Distributed Transactions • ACID properties • Issues: • Commit Protocols • Fault Tolerance Why is this enough? • Failure Models and Limitations • Mechanisms: • Two-phase commit • Three-phase commit
Two-Phase Commit(Lamport ’76, Gray ’79) • Central coordinator initiates protocol • Phase 1: • Coordinator asks if participants can commit • Participants respond yes/no • Phase 2: • If all votes yes, coordinator sends Commit • Participants respond when done • Blocks on failure • Participants must replace coordinator • If participant and coordinator fail, wait for recovery • While blocked, transaction must remain Isolated • Prevents other transactions from completing
Transaction Model • Transaction Model • Global Transaction State • Reachable State Graph • Local states potentially concurrent if a reachable global state contains both local states • Concurrency set C(s) is all states potentially concurrent with s • Sender set S(s) = {local states t | t sends m and s can receive m} • Failure Model • Site failure assumed when expected message not received in time • Independent Recovery
Problems with 2-PC • Blocking on failure • 3-PC as solution • Theorems on recovery limits • Independent recovery: No two-site failure • Non-independent recovery • Anything short of total failure okay • Recovery protocol for total failure
c1 a1 c2 a2 3PC assuming timeout on receipt of message Coordinator Participant q1 q2 start xact/ no start xact/ yes xact request/ start xact abort/ - w1 w2 no/ abort yes/ pre-commit pre-commit/ ack p1 p2 ack/commit commit/ -
Termination Protocol • If participant times out in w2 or p2: • Elect new Coordinator If coordinator alive, would have committed/aborted • New coordinator requests state of all processes. Termination rules: • If any aborted, broadcast abort • If any committed, broadcast commit • If all w2, broadcast abort • If any p2, send pre-commit and enter state p1 • Complete failure protocol
Test Basics • Mechanics: Open book/notes • No electronic aids • Two questions • Each multi-part • Will include scoring suggestions • Underlying question: Do you understand the material? • No need to regurgitate “best in literature” answer • Reasonable self-designed solution fine • Key: Do you really understand your answer • Can you build CORRECT distributed systems?
Develop synchronization protocol for a four processor system with fully-connected processors. Linear envelope of real time Bounded difference between clocks on correct processors. Time set to 0 when the protocol begins (but not synchronized). Assume: Clocks don't drift Messages take between time 0 and e At most one faulty processor No authentication Discuss the correctness of your algorithm, including the types of faults handled. Scoring: Protocol: Up to five points Argument for correctness: 2 points requires believable proof sketch for full 2 points Faults supported / not supported: 1-3 points 3 points requires proof sketch that it handles supported faults and examples showing failure with unsupported fault types. Sample Question:Clock Synchronization