CS514: Intermediate Course in Operating Systems

CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 6: Sept. 12

Client-Server Computing • 99% of all distributed systems use client-server architectures! • Today: look at the client-server problem • Discuss stateless and stateful architectures • Review major file system and database system issues (will revisit some issues in later lectures)

DCE, COM and RMI • Examples of “distributed computing environments.” • They provide tools for performing RPC in client-server systems, standards for data marshalling, etc. • Include services for authentication, binding, life-cycle management, clock synchronization • Won’t focus on specifics today: look at the big picture

Client-Server concept • Server program is shared by many clients • RPC protocol typically used to issue requests • Server may manage special data, run on an especially fast platform, or have an especially large disk • Client systems handle “front-end” processing and interaction with the human user

Server and its clients

Examples of servers • Network file server • Database server • Network information server • Domain name service • Microsoft Exchange • Kerberos authentication server

Business examples • Risk manager for a bank: tracks exposures in various currencies or risk in investments • Theoretical price for securities or bonds: traders use this to decide what to buy and what to sell • Server for an ATM: decides if your withdrawal will be authorized

Bond pricing example • Server receives market trading information, currency data, interest rates data • Has a database of all the bonds on the market • Client expresses interest in a given bond, or in finding a bond with certain properties • Server calculates what that bond (or what each bond) should cost given current conditions

Why use a client-server approach? • Pricing parameters are “expensive” (in terms of computing resources) to obtain: must monitor many data sources and precompute many time-value of money projections for each bond • Computing demands may be extreme: demands a very high performance machine • Database of bonds is huge: large storage, more precomputation

On client side • Need a lot of CPU and graphics power to display the data and interact with the user • Dedicated computation provides snappy response time and powerful decision making aids • Can “cache” or “save” results of old computations so that if user revisits them, won’t need to reissue identical request to server

Summary of typical split • Server deals with bulk data storage, high perf. computation, collecting huge amounts of background data that may be useful to any of several clients • Client deals with the “attractive” display, quick interaction times • Use of caching to speed response time

Statefulness issues • Client-server system is stateless if: Client is independently responsible for its actions, server doesn’t track set of clients or ensure that cached data stays up to date • Client-server system is stateful if: Server tracks its clients, takes actions to keep their cached states “current”. Client can trust its cached data.

Best known examples? • The UNIX NFS file system is stateless. Bill Joy: “Once they replaced my file server during the evening while my machine was idle. The next day I resumed work right where I had left off, and didn’t even notice the change!” • Database systems are usually stateful: Client reads database of available seats on plane, information stays valid during transaction

Typical issues in design • Client is generally simpler than server: may be single-threaded, can wait for reply to RPC’s • Server is generally multithreaded, designed to achieve extremely high concurrency and throughput. Much harder to develop • Reliability issue: if server goes down, all its clients may be “stuck”. Usually addressed with some form of backup or replication.

Use of caching • In stateless architectures, cache is responsibility of the client. Client decides to remember results of queries and reuse them. Example: caching Web proxies, the NFS client-side cache. • In stateful architectures, cache is owned by server. Server uses “callbacks” to its clients to inform them if cached data changes, becomes invalid. Cache is “shared state” between them.

Butler Lampson’s advice • Cache “hints” • Speed up system when hint is correct • Some mechanism can detect when hint is wrong and seek more up to date information • If cached data is viewed as hints, relationship of client and server can be stateless • Example: information about location of a mailbox: if hint is wrong, can run more costly protocol

Butler Lampson’s advice Application Name Server Cache Object was moved Hint became stale

Example of stateless approach • NFS is stateless: clients obtain “vnodes” when opening files; server hands out vnodes but treats each operation as a separate event • NFS trusts: vnode information, user’s claimed machine id, user’s claim uid • Client uses write-through caching policy

... example issues raised • Cache may be stale if someone writes the file after it is opened • Create operation may fail with error EEXISTS if server is not responsive at instant when the create is issued

Example of stateful approach • Transactional software structure: • Data manager holds database • Transaction manager does begin op1 op2 ... opn commit • Transaction can also abort; abort is default on failure • Transaction on database system: • Atomic: all or nothing effects • Concurrent: can run many transactions at same time • Independent: concurrent transactions don’t interfere • Durable: once committed, results are persistent

Comments on transactions • Well matched to database applications • Requires special programming style • Typically, spits operations into read and update categories. Transactional architecture can distinguish these • Idea is to run transactions concurrently but to make it look as if they ran one by one in some sequential (serial) order

Why are transactions stateful? • Client knows what updates it has done, what locks it holds. Database knows this too • Client and database share the guarantees of the model. See consistent states • Approach is free of the inconsistencies and potential errors observed in NFS

Stateful file servers? • Several file servers use aspects of transactional approach: • Andrew file system caches whole files, server knows who has a copy • Sprite file system caches 4k records, uses sophisticated cache consistency protocols • Quicksilver adopts aspects of transactional model to ensure that either all of a set of operations occur, or none

Looking ahead • Later in course will study transactional model in detail; right now will leave topic open • Notice its “database assumptions” • Separation of computation from data • Operations are split into reads and writes • Transactions are designed to run independently • Can be applied in object oriented systems but proves awkward for very general distibuted uses

Client-server performance issues • Performance revolves around degree of concurrency achieved in server, effectiveness of caching, quality of prefetching • NFS “versus” AFS, Sprite illustrates this point

NFS performance: from prefetching • User opens file, reads sequentially • Each read done as a separate RPC from server, but... • If reads are sequential, client starts to prefetch blocks by anticipating the read and pre-issuing the RPC • Result: data is usually in cache when needed

AFS, Sprite do large block xfers • User opens file • Server sends a large amount of data, or whole file, using a streaming protocol • Approach reduces server workload: n requests from client become one request, file is sent very efficiently • But client must cache much more data

Experience? • NFS, AFS and Sprite behave comparably for “random” requests • AFS and Sprite do much better under heavy load • RPC approach consumes much more CPU time and network time

NFS security problems • NFS supports an authentication protocol but rarely used: can’t be exported and is available mostly for SUN systems • Lacking authentication, server can be fooled by applications that construct fake packets! • Implication is that almost any user of a network can access any NFS file stored on that network without real permissions being enforced!!!

Current issues in client-server systems • Research is largely at a halt: we know how to build these systems • Challenges are in the applications themselves, or in the design of the client’s and servers for a specific setting • Biggest single problem is that client systems know the nature of the application, but servers have all the data

Typical debate topic? • Ship code to the data (e.g. program from client to server)? • ... or ship data to the code? (e.g. client fetches the data needed) • Will see that Java, Tacoma and Telescript offer ways of trading off to avoid inefficient use of channels and maximum flexibility

Message Oriented Middleware • Emerging extension to client-server architectures • Concept is to weaken the link between the client and server but to preserve most aspects of the “model” • Client sees an asynchronous interface: request is sent independent of reply. Reply must be dequeued from a reply queue later

MOMS: How they work • MOM system implements a queue in between clients and servers • Each sends to other by enqueuing messages on the queue or queues for this type of request/reply • Queues can have names, for “subject” of the queue • Client and server don’t need to be running at the same time.

MOMS: How they work client MOMS Client places message into a “queue” without waiting for a reply. MOMS is the “server”

MOMS: How they work server MOMS Server removes message from the queue and processes it.

MOMS: How they work server MOMS Server places any response in a “reply” queue for eventual delivery to the client. May have a timeout attached (“delete after xxx seconds”)

MOMS: How they work client MOMS Client retrieves response and resumes its computation.

Pros and Cons of MOMS • Decoupling of sender, destination is a plus: can design the server and client without each knowing much about the other, can extend easily • Performance is poor, a (big) minus: overhead of passing through an intermediary • Priority, scheduling, recoverability are pluses .... use this approach if you can afford the performance hit, a factor of 10-100 compared to RPC

Other MOMS issues? • Management of the queues • Handling runaway applications that flood queue with requests or fail to collect responses • Cleanup after a crash in applications not designed to handle this case

Major uses of MOMS • IBM MQ Server used heavily to connect older mainframe applications to new distributed computing front-ends • DEC Message Q popular in process-control and production settings that mix midsize computers with dedicated machine tools • Will see “message bus” examples of MOMS later in this lecture series

Client-Server Issues • A big challenge is to make servers scale • Add more and more clients • Or somehow clone the server • Or perhaps partition data so that different requests can go to different servers with the corresponding subset of the data • Management of server clusters is a very hard problem • Fault-tolerance is also a hot topic

Scaling • A farmof servers is a physically distributed large-scale pool of machines with the same service available at lots of places • Farms usually are built to exploit physical proximity • Example: Exodus and Akamai are companies that operate server farms for their clients

Akamai Looks like a server with unusual capacity

Akamai server Main page still comes from server But “static content” fetched from a close-byAkamai server running at your ISP

RACS and RAPS • RACS: Reliable array of cloned servers • Servers are identical • Just spray requests for even load • RAPS: Reliable array of partitioned servers • Servers partition data: A-F, G-Z… • A partition might be a RACS • Typically, if needed, use some form of backup for fault-tolerance

Akamai Main page still comes from server The server itself is a cluster But “static content” fetched from a close-byAkamai server running at your ISP

Akamai Main page still comes from server The server itself is a cluster Backups provide fault-tolerance But “static content” fetched from a close-byAkamai server running at your ISP

Fault-Tolerance • We showed one backup per node • In real settings might prefer one per n nodes • A complex topic that will be covered later in the course • Challenge is to replicate the data needed so the backup can seamlessly take over when the primary fails • Also need a good way to detect failure!

Next lecture? • Will study issues associated with consistency • Reading: Chapters 9, 10 • Homework: identify as many examples of client-server structures as possible in your local area networking environment. • Hint: look for examples of X.500 “servers”, SNMP “servers”, file servers, time servers, database servers, authentication servers...

CS514: Intermediate Course in Operating Systems