Distributed Systems

Distributed Systems Go Over Minqi Zhou(周敏奇) mqzhou@sei.ecnu.edu.cn Room 111 (East) Mathematics Building 021-32204750-167

Over View Why distributed system Naming Communication Synchronization Secure

What can we do nowthat we could not do before?

Technology advances Networking Processors Memory Storage Protocols

Building and classifyingdistributed systems

Flynn’s Taxonomy (1972) number of instruction streamsand number of data streams SISD • traditional uniprocessor system SIMD • array (vector) processor • Examples: • APU (attached processor unit in Cell processor) • SSE3: Intel’s Streaming SIMD Extensions • PowerPC AltiVec (Velocity Engine) MISD • Generally not used and doesn’t make sense • Sometimes applied to classifying redundant systems MIMD • multiple computers, each with: • program counter, program (instructions), data • parallel and distributed systems

Subclassifying MIMD memory • shared memory systems: multiprocessors • no shared memory: networks of computers, multicomputers interconnect • bus • switch delay/bandwidth • tightly coupled systems • loosely coupled systems

You know you have a distributed system when the crash of a computer you’ve never heard of stops you from getting any work done. – Leslie Lamport

Coupling Tightly versus loosely coupled software Tightly versus loosely coupled hardware

Design issues: Transparency High level: hide distribution from users Low level: hide distribution from software • Location transparency:users don’t care where resources are • Migration transparency:resources move at will • Replication transparency:users cannot tell whether there are copies of resources • Concurrency transparency:users share resources transparently • Parallelism transparency:operations take place in parallel without user’s knowledge

Design issues Reliability • Availability: fraction of time system is usable • Achieve with redundancy • Reliability: data must not get lost • Includes security Performance • Communication network may be slow and/or unreliable Scalability • Distributable vs. centralized algorithms • Can we take advantage of having lots of computers?

Service Models

Centralized model • No networking • Traditional time-sharing system • Direct connection of user terminals to system • One or several CPUs • Not easily scalable • Limiting factor: number of CPUs in system • Contention for same resources

Directory server Print server File server client client Client-server model Environment consists of clientsand servers Service: task machine can perform Server: machine that performs the task Client: machine that is requesting the service Workstation model assume client is used by one user at a time

Peer to peer model • Each machine on network has (mostly) equivalent capabilities • No machines are dedicated to serving others • E.g., collection of PCs: • Access other people’s files • Send/receive email (without server) • Gnutella-style content sharing • SETI@home computation

Processor pool model What about idle workstations(computing resources)? • Let them sit idle • Run jobs on them Alternatively… • Collection of CPUs that can be assigned processes on demand • Users won’t need heavy duty workstations • GUI on local machine • Computation model of Plan 9

Grid computing Provide users with seamless access to: • Storage capacity • Processing • Network bandwidth Heterogeneous and geographically distributed systems

Naming

Naming things • User names • Login, email • Machine names • rlogin, email, web • Files • Devices • Variables in programs • Network services

Naming Service Allows you to look up names • Often returns an address as a response Might be implemented as • Search through file • Client-server program • Database query • …

What’s a name? RFC 1498: Inter-network Naming, addresses, routing Name: identifies what you want Address: identifies where it is Route: identifies how to get there Binding: associates a name with an address • “choose a lower-level-implementation for a higher-level semantic construct”

Names Need names for: • Services: e.g., time of day • Nodes: computer that can run services • Paths: route • Objects within service: e.g. files on a file server Naming convention can take any format • Ideally one that will suit application and user • E.g., human readable names for humans, binary identifiers for machines

Naming 5.2 Flat Naming Flat naming Problem Given an essentially unstructured name (e.g., an identiﬁer), how can we locate its associated access point? Simple solutions (broadcasting) Home-based approaches Distributed Hash Tables (structured P2P) Hierarchical location service 6 / 38

RPC

Problems with sockets Sockets interface is straightforward • [connect] • read/write • [disconnect] BUT … it forces read/write mechanism • We usually use a procedure call To make distributed computing look more like centralized: • I/O is not the way to go

RPC 1984: Birrell & Nelson • Mechanism to call procedures on other machines Remote Procedure Call Goal: it should appear to the programmer that a normal call is taking place

Implementing RPC The trick: Create stub functions to make it appear to the user that the call is local Stub function contains the function’s interface

client server Stub functions Marshal, Unmarshal return, return to client code client functions server functions client stub server stub(skeleton) network routines network routines

Parameter passing Pass by value • Easy: just copy data to network message Pass by reference • Makes no sense without shared memory

Representing data No such thing asincompatibility problems on local system Remote machine may have: • Different byte ordering • Different sizes of integers and other types • Different floating point representations • Different character sets • Alignment requirements

Concurrency

Schedules • Transactions must have scheduled so that data is serially equivalent • Use mutual exclusion to ensure that only one transaction executes at a time • or… • Allow multiple transactions to execute concurrently • but ensure serializability • concurrency control • schedule: valid order of interleaving

Methods Two Phase locking Strict two phase locking Read/write lock Two version locking

Synchronization

Physical clocks in computers Real-time Clock: CMOS clock (counter) circuit driven by a quartz oscillator • battery backup to continue measuring time when power is off OS generally programs a timer circuit to generate an interrupt periodically • e.g., 60, 100, 250, 1000 interrupts per second(Linux 2.6+ adjustable up to 1000 Hz) • Programmable Interval Timer (PIT) – Intel 8253, 8254 • Interrupt service procedure adds 1 to a counter in memory

Problem Getting two systems to agree on time • Two clocks hardly ever agree • Quartz oscillators oscillate at slightly different frequencies Clocks tick at different rates • Create ever-widening gap in perceived time • Clock Drift Difference between two clocks at one point in time • Clock Skew

RPC Simplest synchronization technique • Issue RPC to obtain time • Set time what’s the time? client server 3:42:19 Does not account for network or processing latency

Tserver Cristian’s algorithm Compensate for delays • Note times: • request sent: T0 • reply received: T1 • Assume network delays are symmetric server request reply client time T0 T1

Tserver server request reply client time T0 T1 Cristian’s algorithm = estimated overhead in each direction Client sets time to:

Time synchronization Berkeley algorithm NTP SNTP

Logical clocks Assign sequence numbers to messages • All cooperating processes can agree on order of events • vs. physical clocks: time of day Assume no central time source • Each system maintains its own local clock • No total ordering of events • No concept of happened-when

Happened-before Lamport’s “happened-before” notation a b event a happened before event b e.g.: a: message being sent, b: message receipt Transitive: if ab and bc then ac

Lamport’s algorithm • Each message carries a timestamp of the sender’s clock • When a message arrives: • if receiver’s clock < message timestamp set system clock to (message timestamp + 1) • else do nothing • Clock must be advanced between any two events in the same process

Lamport’s algorithm Algorithm allows us to maintain time ordering among related events • Partial ordering

Event counting example e a b c d f P1 3 4 5 1 2 6 g h i P2 2 1 7 6 j k P3 1 2 7

Problem: Detecting causal relations If L(e) < L(e’) • Cannot conclude that ee’ Looking at Lamport timestamps • Cannot conclude which events are causally related Solution: use a vector clock

Vector clocks Rules: • Vector initialized to 0 at each processVi [j] = 0 for i, j =1, …, N • Process increments its element of the vector in local vector before timestamping event:Vi [i] = Vi [i] +1 • Message is sent from process Pi with Vi attached to it • When Pj receives message, compares vectors element by element and sets local vector to higher of two values Vj [i] = max(Vi [i], Vj [i]) for i=1, …, N

Group Communication

Modes of communication • unicast • 11 • Point-to-point • anycast • 1nearest 1 of several identical nodes • Introduced with IPv6; used with BGP • netcast • 1 many, 1 at a time • multicast • 1many • group communication • broadcast • 1all

Groups Groups are dynamic • Created and destroyed • Processes can join or leave • May belong to 0 or more groups Send message to one entity • Deliver to entire group Deal with collection of processes as one abstraction

Distributed Systems