1 / 39

Introduction

Explore the primary characteristics, motivations, and challenges of building distributed systems through real-world examples like banking, retail, and air-traffic control. Understand the necessity of transparency, scalability, and reliability in distributed applications.

jimmier
Download Presentation

Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction

  2. Outline • Definitions • Challenges • Examples to Illustrate Challenges • Goals in Application Development • Summary

  3. Definition of a Distributed System • A distributed system: • Multiple connected CPUs working together • A collection of independent computers that appears to its users as a single coherent system • Examples: parallel machines, networked machines • Lamport’s definition: A distributed system is one in which I cannot get something done because a machine I've never heard of is down.

  4. Primary Characteristics of a Distributed System • Multiple computers • Concurrent execution • Independent operation and failures • Communications • Ability to communicate • No tight synchronization • Relatively easy to expand or scale • Transparency

  5. Example: A Typical Intranet (Coulouris)

  6. intranet % % ISP % % backbone satellite link desktop computer: server: network link: Example: A Typical Portion of the Internet (Coulouris)

  7. Example: Portable and Handheld Devices in a Distributed System (Coulouris)

  8. Motivation for Building Distributed Systems • Economics • Share resources • Relatively easy to expand or scale • Speed – A distributed system may have more total computing power then a mainframe. • Cost • Personalized environments • Location independence • People and information are distributed • Expandibility • Availability and Reliability • If a machine crashes, the system as a whole can survive.

  9. Distributed Application Examples • Banking, stock markets, stock brokerages • Heath care, hospital automation • Control of power plants, electric grid • Telecommunications infrastructure • Electronic commerce and electronic cash on the Web (very important emerging area) • Corporate “information” base: a company’s memory of decisions, technologies, strategy • Military command, control, intelligence systems • Retail • Air-traffic control • GAUL

  10. Examples in More Detail • Air-Traffic Control • This is not an Internet application. • In many countries, airspace is divided into areas which in turn may be divided into sectors. • Each area is managed by a control center. • Control systems communicate with tower control and other control systems (to allow a plane to cross boundaries). • The planes and air-traffic controls are “distributed”. A single centralized system is not feasible.

  11. Examples in More Detail • World Wide Web • Shared Resources: Documents • Unique identification using URLs • Users interested in the documents are distributed. • The documents are also distributed. • Banking • Clients may access their accounts from ATM machines. • There may be multiple clients attempt to access their accounts simultaneously. • Multiple copies of account information allows quicker access.

  12. Examples in More Detail • Retail • Stores are located near their customer base. • Point of Sale (POS) terminals are used to customer interactions while mobile units are used for inventory control. • These units talk to a local processor which in turn may communicate with remote processors. • Gaul • What is being shared includes disk space, e-mail server, web server, software

  13. Challenges • Heterogeneity • Networks • Hardware • Operating systems • Programming languages

  14. Challenges • Failure Handling • Partial failures • Can non-failed components continue operation? • Can the failed components easily recover? • Detecting failures • Recovery • Replication • We will now examine the challenges in the context of two applications

  15. Illustrative Example: Banking Bank Branch Bank Branch Network Request money withdrawal of 100 euros in Paris Bank Branch

  16. Illustrative Example: Banking Bank Branch Bank Branch Network Money given right away Bank Branch

  17. Illustrative Example: Banking Bank Branch Bank Branch Network Later, ATM contacts bank Bank Branch

  18. Illustrative Example: Banking • Accounts are replicated • Why replication? • Performance; A single server does not scale very well • Reliability; What if the single server went down?

  19. Illustrative Example: Banking Bank Branch Bank Branch Network Bank Branch University contacts another bank branch to deposit a salary

  20. Illustrative Example: Banking Bank Branch Bank Branch Network Bank Branch Bank Branch applies interest to an account

  21. Illustrative Example: Banking • Hmm. If the operations on the account are not done in the same order then the accounts will have different amounts. • Replicas of an account should be consistent. • What’s the big deal? The ATM transaction goes first, followed by salary deposit which is followed by the interest operation. • How do you actually know which operation occurred first?

  22. Illustrative Example: Banking • Use clocks • There is no global clock; Must rely on local clocks. • It is very difficult to synchronize local physical clocks. • Network latency is a factor. • Let’s say that the ATM operation occurs at 10:00 AM, the salary deposit occurs at 10:01 AM and the interest payout occurs at 10:02 AM. • Network latency may mean that the ATM operation arrives at the bank branches at 10:05AM which may be after the other operations have arrived at the bank branches. • It’s actually worse. The ATM operation may arrive at a bank branch after an interest calculation but arrive before an interest calculation at another branch. • How does a bank branch know to wait for the ATM operation?

  23. Illustrative Example: Banking • Replication is a headache. Don’t replicate. • Could do that but it overloads a server and causes poor performance. • A bank does not want to limit the number of its users as the result of slowness. • An e-commerce site does not want to lose customers as the result of a slow system. • What ifthe server goes down? • Wait … You may have replication and one of the servers goes down. • Operations at the other branches continue • What if the server comes back up? Isn’t it going to have different contents?

  24. Illustrative Example: Banking • Can’t rely on clocks and we want to replicate? Then what? • We will study algorithms that provide the notion of “logical” clocks. • The concept of logical clocks will be the basis of several algorithms that provide consistency across replicas in a transparent fashion. • Transparent: Should users have to know that the system is replicated e.g., should the ATM user know that their account is replicated in order to use the system.

  25. Illustrative Example: Banking • Bank mergers: • Different (heterogeneous) systems • How do we integrate • How open should systems be? • Can the system be extended and re-implemented • Are interfaces published • Is there a uniform mechanism to access resources • How do we ensure that updates to an account are valid?

  26. Illustrative Example: Game

  27. Illustrative Example: Game • Some games require that the game state (or part of the game state) is found with each player. • Would like to make sure that the game state is consistent e.g., • Three users (U1, U2, U3) participate in a first person shooter. • As viewed from U1: U1 pushes a button that disarms all opponents. • As viewed from U2: Just before U1 pushes the button U2 shoots U1. • What does U3 see? • Ordering of events (even if they appear to happen concurrently) is required. • Ensuring every user views events in the same order is commonly termed identical ordering or total ordering.

  28. Illustrative Example: Game • Consistency is important but so is speed. • Does a game have the same consistency requirements as a banking application? • Turns out the answer is no. • We will study different types of consistency and the algorithms and systems support to provide for the different types of consistency

  29. Illustrative Example: Game • A trivial attempt at satisfying ordering is to use TCP to ensure FIFO and have a central server through which all messages must pass through. • The central server, together with TCP, ensures all nodes receive the same messages in the same order • What about node failure? • TCP is slow; Why not use UDP? • Well UDP is faster but doesn’t ensure FIFO ordering.

  30. Goals of Application Development • Connectivity • Transparency • Reliability • Consistency • Security • Openness • Scalability

  31. Connectivity • It should be easy for users to access remote resources and to share them with other users in a controlled fashion. • Resources that can be shared include printers, storage facilities, data, files, web pages, etc; • Why? Economical • Connecting users and resources makes collaboration and the exchange of information easier. • Just look at e-mail

  32. Transparency • A distributed system that is able to present itself to users and applications as if it were only a single computer system is said to be transparent. • Very difficult to make distributed systems completely transparent. • You may not want to, since transparency often comes at the cost of performance.

  33. Transparency in a Distributed System Different forms of transparency in a distributed system.

  34. Degree of Transparency • The goal of full transparency is not always desirable. • Users may be located in different continents; distribution is apparent and not something you want to hide. • Completely hiding failures of networks and nodes is (theoretically and practically) impossible: • You cannot distinguish a slow computer from a failing one. • You can never be sure that a server actually performed an operation before a crash. • Full transparency will cost in performance. • Keeping Web caches exactly up-to-date with the master copy • Immediately flushing write operations to disk for fault tolerance.

  35. Openness • An open distributed system allows for interaction with services from other open systems, irrespectively of the underlying environment. • Systems should conform to well-defined interfaces. • Systems should support portability of applications. • Systems should easily interoperate. Interoperability is characterized by the extent by which two implementations of systems or components from different manufacturers can co-exist and work together. • Example: In computer networks there are rules that govern the format, contents and meaning of messages send and received.

  36. Scalability • There are three dimensions to scalability: • The number of users and processes (size scalability) • The maximum distance between nodes (geographical scalability) • The number of administrative domains (administrative scalability)

  37. Techniques for Scaling • Partition data and computations across multiple machines • Move computations to clients (Java applets) • Decentralized naming services (DNS) • Decentralized information systems (WWW) • Make copies of data available at different machines • Replicated file servers (for fault tolerance) • Replicated databases • Mirrored web sites • Allow client processes to access local copies • Web caches (browser/Web proxy) • File caching (at server and client)

  38. Scaling – The problem • Applying scaling techniques is easy, except for the following: • Having multiple copies (cached or replicated) leads to inconsistencies – modifying one copy makes that copy different from the rest. • Always keeping copies consistent requires global synchronization. • Global synchronization is expensive with respect to performance. • We have learned to tolerate some inconsistencies.

  39. Summary • Distributed systems consist of autonomous computers that work together. • When properly designed, distributed systems can scale well with respect to the size of the underlying network. • Many challenges of which many will be addressed in the course.

More Related