670 likes | 985 Views
CS 603 Review. April 24, 2002. Seminar Announcements. Saurabh Bagchi, “Hierarchical Error Detection in a Distributed Software Implemented Fault Tolerance (SIFT) Environment” April 25, 10:30-11:30, MSEE 239 Fabian E. Bustamante, “The Active Streams Approach to Adaptive Distributed Systems
E N D
CS 603Review April 24, 2002
Seminar Announcements • Saurabh Bagchi, “Hierarchical Error Detection in a Distributed Software Implemented Fault Tolerance (SIFT) Environment” • April 25, 10:30-11:30, MSEE 239 • Fabian E. Bustamante, “The Active Streams Approach to Adaptive Distributed Systems • April 29, 10:30-11:30, CS 101
Review • Why do we want distributed systems? • Scaling • Heterogeneity • Geographic Distribution • What is a distributed system? • Transparency vs. Exposing Distribution • Hardware Basics • Communication Mechanisms
Basic Software Concepts • Hiding vs. Exposing • Distribution – Distributed OS • Location, but not distribution – Middleware • None – Network OS • Concurrency Primitives • Semaphores • Monitors • Distributed System Models • Client-Server • Multi-Tier • Peer to Peer
Communication Mechanisms • Shared Memory • Enforcement of single-system view • Delayed consistency: δ-Common Storage • Message Passing • Reliability and its limits • Stream-oriented Communications • Remote Procedure Call • Remote Method Invocation
RPC Mechanisms • DCE • Language / Platform Independent • Implementation Issues: • Data Conversion • Underlying Mechanisms • Fault Tolerance Approaches • Java RMI • SOAP • Interoperable • Language independent • Transport independent (anything that moves XML)
Naming Requirements • Disambiguate only • Access resource given the name • Build a name to find a resource • Do humans need to use name? • Static/Dynamic Resource • Performance Requirements
Registry Example: X.500 • Goal: Global “white pages” • Lookup anyone, anywhere • Developed by Telecommunications Industry • ISO standard directory for OSI networks • Idea: Distributed Directory • Application uses Directory User Agent to access a Directory Access Point • Basis for LDAP, ActiveDirectory
Directory Information Base(X.501) • Tree structure • Root is entire directory • Levels are “groups” • Country • Organization • Individual • Entry structure • Unique name • Build from tree • Attributes: Type/value pairs • Schema enforces type rules • Alias entries
X.500 • Directory Entry: • Organization level – CN=Purdue University, L=West Lafayette • Person level – CN=Chris Clifton, SN=Clifton, TITLE=Associate Professor • Directory Operations • Query, Modify • Authorization / Access control • To directory • Directory as mechanism to implement for others
X.500 – Distributed Directory • Directory System Agent • Referrals • Replication • Cache vs. Shadow copy • Access control • Modifications at Master only • Consistency • Each entry must be internally consistent • DSA giving copy must identify as copy
Clock Synchronization • Definition: All nodes agree on time • What do we mean by time? • What do we mean by agree? • Lamport Definition: Events • Events partially ordered • Clock “counts” the order
Event-based definition(Lamport ’78) Define partial order of processes • A B: A “happened before” B: Smallest relation such that: • If A and B in same process and A occurs first, A B • If A is sending a message and B is receipt of a message, A B • If A B and B C, then A C • Clock: C(x) is time x occurs: • C(x) = Ci(x) where x running on node i. • Clocks correct if a,b: ab C(a) < C(b)
Lamport Clock Implementation • Node i Increments Ci between any two successive events • If event a is sending of a message m from i to j, • m contains timestamp Tm = Ci(a) • Upon receiving m, set Cj≥ current Cj and > Tm • Can now define total ordering. a b iff: • Ci(a) < Cj(b) • Ci(a) = Cj(b) and Pi < Pj
What if we want “wall clock” time? • Ci must run at correct rate: • κ << 1 such that | dCi(t)/dt – 1 | < κ • Synchronized: • small ε such that i,j: | Ci(t) – Cj(t) | < ε • Assume transmission time between μ and μ+ξ • Algorithm: Upon receiving message m,set Cj(t) = max(Cj(t), Tm+μ) • Theorem: Assume every τ seconds a message with unpredictable delay ξ is sent over every arc. Then t ≥ t0 + τd, ε≈ d(2κτ + ξ)
Clock Synchronization:Limits • Best Possible: Delay Uncertainty • Actually ε(1 – 1/n) • Synchronization with Faults • Faulty clock • Communication Failure • Malicious processor • Worst case: Can only synchronize if < 1/3 processors faulty • Better if clocks can be authenticated
Process Synchronization • Problem: Shared Resources • Model as sequential or parallel process • Assumes global state! • Alternative: Mutual Exclusion when Needed • Coordinator approach • Token Passing • Timestamp
Mutual Exclusion • Requirements • Does it guarantee mutual exclusion? • Does it prevent starvation? • Is it fair? • Does it scale? • Does it handle failures?
Mutual Exclusion:Colored Ticket Algorithm • Goals: • Decentralized • Fair • Fault tolerant • Space Efficient • Idea: Numbered Tickets • Next number gets resource • Problem: Unbounded Space • Solution: Reissue blocks
Multi-ResourceMutual Exclusion • New Problem: Deadlock • Processes using all resources • Each needs additional resource to proceed • Dining Philosophers Problem • Coordinated vs. truly distributed solutions • Problems with deterministic solutions • Probabilistic solution – Lehman & Rabin • Starvation / fairness properties
Distributed Transactions • ACID properties • Issues: • Commit Protocols • Fault Tolerance Why is this enough? • Failure Models and Limitations • Mechanisms: • Two-phase commit • Three-phase commit
Two-Phase Commit(Lamport ’76, Gray ’79) • Central coordinator initiates protocol • Phase 1: • Coordinator asks if participants can commit • Participants respond yes/no • Phase 2: • If all votes yes, coordinator sends Commit • Participants respond when done • Blocks on failure • Participants must replace coordinator • If participant and coordinator fail, wait for recovery • While blocked, transaction must remain Isolated • Prevents other transactions from completing
Transaction Model • Transaction Model • Global Transaction State • Reachable State Graph • Local states potentially concurrent if a reachable global state contains both local states • Concurrency set C(s) is all states potentially concurrent with s • Sender set S(s) = {local states t | t sends m and s can receive m} • Failure Model • Site failure assumed when expected message not received in time • Independent Recovery
Problems with 2-PC • Blocking on failure • 3-PC as solution • Theorems on recovery limits • Independent recovery: No two-site failure • Non-independent recovery • Anything short of total failure okay • Recovery protocol for total failure
Data Replication • Fault Tolerance • Hot backup • Catastrophic failure • Performance • Parallelism • Decreased reliance on network • Correctness criterion: Replication invisible • One-copy serializability (1SR)
Data Replication: How? • Goal: Ensure one-copy serializability • Write-all solution: All copies identical • Write goes to every site • Read from any site • Standard single-copy concurrency control • Guarantees 1SR • Single-copy concurrency control gives serializable execution • Equivalent to serial execution where all writes happen in one transaction
Problem: Site Failure • Failure causes write to block • Must maintain locks • Clogs up entire system Is this fault tolerance? • What about “write all available”? • T0: w0[xA] w0[xB] w0[yC] c0 • B-fails • T1: r1[yC] w1[xA] c1 • B-recovers • T2: r2[xB] w2[yC] c2 • What is the serial equivalent order?
Solutions • Validate availability on commit • Check if any failed writes now available • Check that all sites read or written still available • Enforces serializability for site failures Doesn’t work with communication failures!
Formalisms for Relaxed consistency • Goal: Relaxed consistency constraints • Meet application needs • Outperform true transparent replication • How do we ensure constraints meet needs? • Formalisms to describe application needs • Methods to prove constraints adequate
Quasi-Copies(Alonso, Barbará, Garcia-Molina ’90) • Data Caching • Each site keeps copy of data likely to be used locally • Propagation cost of writes high • User-Defined Cache • Controlled Divergence • Weak consistency constraints • Bounds on the differences between copies • User defines constraints
Assumptions • Read-only copies • Updates sent to master copy • E.g., ORACLE Materialized View • User Specified Coherency • Strict limits • “Hints” • Example: Stock Purchase • Place order based on delayed price • Limit order to ensure price paid okay
Selection Conditions • Identification clause • Select/Project Query • Modifier Clause • Add / drop from cache • Compulsory or advisory cache • Static / Dynamic: As new objects meet the identification clause, are they cached? • Triggering delay on dynamic
Coherency Conditions • Default (always enforced): Value was true once • Delay W(x,α): Max time lag • Version V(x): Number of updates • Periodic P(x): Time for refresh • Arithmetic A(x): Bounded Difference • Combine conditions with logical operators • Multi-object conditions • Consistency conditions on a group • Order of application in a group
CS 603Review April 26, 2002
Remote Operation Mechanisms • Client-Server Model: • Remote Procedure Call Problem: Remote Site must already know what we want to do! • Process consists of: • Code • Resources (files, devices, etc.) • Execution (data, stack, registers, etc.) • Fork copies everything • Is this needed? • Solution: Copy part of the process
So where are we? • Models for Remote Processing • Server: Request documented service • RPC: Request execution of existing procedure • What if operation we want isn’t offered remotely? • Solution: Agents / Code Migration
Types of Code Migration From Andrew Tanenbaum, Distributed Operating Systems, 1995.
DCOM – What is it? • Start with COM – Component Object Model • Language-independent object interface • Add interprocess communication
DCOM:Distributed COM • Looks like COM to the client • Built on DCE RPC • Extends to support full COM functionality
Locating Objects:Activation • CoCreateInstance(Ex)(<CLSID>) • Interface pointer to uninitialized instance • Same as COM • CoiGetInstanceFromFile, FromStorage • Create new instance • CoGetClassObject(<CLSID>) • Factory object that creates objects of <CLSID> • CoGetClassObjectFromURL • Downloads necessary code from URL and instantiates • Can take server name as parameter • Or default to server specified in DCOM configuration on client machine [HKEY_CLASSES_ROOT\APPID\{<appid-guid>}] "RemoteServerName"="<DNS name>“ • Also store information in ActiveDirectory
CORBA Single interface name Multiple inheritance Dynamic Invocation Interface C++-style Exception Handling Explicit and Implicit reference counts Implemented by ORB with replaceable services DCOM Distinction between Class and Instance Identifier Implement multiple interfaces Type libraries for on-demand marshaling 32 Bit Error Code Explicit reference count only Implemented by many independent services DCOM vs. CORBA
What is .NET? • Language for distributed computation • C#, VB.NET, JScript • Protocols • SOAP, HTTP • Run-time environment • Common Language Runtime (CLR) • ActiveDirectory • Web Servers (ASP.NET)
DCOM IDL Name, Monikers Registry / ActiveDirectory C++, Visual Basic DCE RPC DCOM Network protocol (based on DCE standards) .NET Web Services Description Language (WSDL) DISCO (URI grammar) Universal Description Discovery and Integration (UDDI) C#, VB.NET SOAP HTTP (presumed ubiquitous), SMTP (!?) COM/DCOM .NET
How .NET works • Query UDDI directory to get service location • Query service to get WSDL (interface specification) • Build call (XML) based on WSDL spec. • Make call using SOAP • Parse XML results based on WSDL spec.
Jini:Java Middleware • Tools to construct federation • Multiple devices, each with Java Virtual Machine • Multiple services • Uses (doesn’t replace) Java RMI • Adds infrastructure to support distribution • Registration • Lookup • Security
Service • Basic “unit” of JINI system • Members provide services • Federate to share access to services • Services combined to accomplish tasks • Communicate using service protocol • Initial set defined • Add more on the fly
Infrastructure:Key Components • RMI • Basic communication model • Distributed Security System • Integrated with RMI • Extends JVM security model • Discovery/join protocol • How to register and advertise services • Lookup services • Returns object implementing service (really a local proxy)