Challenge Issues in Distributed Systems

Challenge Issues in Distributed Systems ECE7650 Challenge Issues

Glory of the Internet • 1960s: queuing theory and packet switching principles (ARPNET) • 1970s: Proprietary networks (Ethernet) and Inter-networking (TCP/IP) • 1980s: Network protocols (smtp, ftp, etc), new networks like NSFnet and Bitnet • 1990s: Killer applications (web and e-commerce), commercialization • 2000s: Applications blooming (p2p, VoiceIP, social networks, cloud and storage services,etc)

Reasons for Internet Success • Cerf and Kahn’s internetworking principles (1974) • minimalism, autonomy - no internal changes required to interconnect networks • best effort service model • stateless routers • decentralized control • Design philosophy is to make it simple • A key architectural feature is “narrow-waisted hourglass model” with a well-defined small interface at the mid-level

Assumptions • Stationary hosts in wired network • Each host is assigned a topologically-dependent IP address • Routing is based on IP address • But mobile and wireless comm becomes pervasive • Friendly environment • Hosts trust each other, little concern of security and privacy • TCP/IP is non-secure by design • “Identity assumption” is no longer valid Accountability problem • Small scale and uniform edge • Grew out of early small scale ARPNET experience • No one could image today’s hundred of mills hosts, billions’ cellular phones are ready to be plugged in; sensor networks, etc

Assumptions (cont’) • Simple applications • Alternative reliable communication infrastructure based on Cerf and Khan’s principles and “narrow-waisted hourglass model” • Clearly defined applications are supported by a well-defined functional interface • Good will and cooperative • Best effort, store & forward, autonomous, distributed decisions in intra-domain, as well as inter-domain (BGP) • Reality is a battlefield of multi-players; competition, economic incentives must be taken into account

Ad Hoc Work-Around • To accommodate mobile hosts • Mobile IP, but • IP addr corresponding host and triangle routing is inefficient • TCP hides high delay and loss rate in wireless networks by dealing with them as congestion • Hostile environment • Firewall, but • Violate end-to-end argument; possibility of firewall must be taken into account by appl designers • IPsec? How can you prevent from attacks/harassment by unsolicited traffic! • Large scale, diversified edge • NAT relieves the shortage of address; Routers should process up to layer 3, but NAT router needs to process layer 4

Ad Hoc Work-Around (cont’) • Meet various application requirements • QoS-aware routers: IntServ, DiffServ • RSVP, etc • Hard to deploy widely (all routers along the path) • Non-cooperative, competitive • Service Level Agreement (SLA) enforcement? • Big BGP problems in inter-domain routing; a single mistyped command at a router at one ISP caused disruption of connectivity across many neighbors • Economic incentive? • Hard to reach consensus between competitors; sometimes standardization may lose the market advantage

? ? ? ? ? ? ? But multimedia apps requires QoS and level of performance to be effective! ? ? ? ? Today’s Internet multimedia applications use application-level techniques to mitigate (as best possible) effects of delay, loss Application-Level Mitigation TCP/UDP/IP: “best-effort service” • no guarantees on delay, loss Any problem in computer science can be solved with another layer of indirection [except the problem of too many layers of indirection] (David Wheeler, PhD’51)

application transport network link physical Distributed systems layer Distributed Apps Middleware Service: • Communication: • Sync vsAsynccomm • Group comm • Reliable comm • Transactional comm • Latency tolerance • Etc • Coordination OS/Net module NIC/Driver Challenge Issues

What is a Distributed System • A system in which hw or sw components located at networked computers communicate and coordinate their actions only by passing messages. [CDK] • Autonomous: independent failures • Concurrent program execution is norm • No global clok: coordination by exchanging messages • Examples • Basic Internet services like Web, email, ftp, • Streaming apps (audio, video) • P2P file sharing (bitorrent) • Cloud computing and storage services Challenge Issues

Middleware • Computer sw that connects sw components or some people and their applications. The software consists of a set of services that allows multiple processes running on one or more machines to interact. • The set of services together defines a uniform computing model for use by the programmers of servers and distributed apps Challenge Issues

Challenge Issues • Heterogeneity • Heterogeneous components must be able to interoperate • Distribution transparency • Distribution should be hidden from the user as much as possible • Fault tolerance • Failure of a component (partial failure) should not result in failure of the whole system • Scalability • System should work efficiently with an increasing number of users • System performance should increase with inclusion of additional resources. Challenge Issues

Challenge Issues (cont’) • Concurrency • Shared access to resources must be possible • Openness • Interfaces should be publicly available to ease adding new components • Security • The system should only be used in the way intended Challenge Issues

Heterogeneity • Variety of computers in a DS • Networks, computer HW, OS, Programming languages, various implementations, etc • E.g. network protocols, data types, • Middleware is a software layer providing a programming abstraction as well as masking the heterogeneity. • E.g. CORBA, Java RMI are example • Virtual machine approach provides a way of making code executable on any hw. E.g JVM Challenge Issues

Openness • Characteristic that determines whether the system can be extended or re-implemented in various ways without disruption to or duplication of existing services. • HW extension: peripheral, memory, network interface • SW extension: OS features, communication protocols, resource sharing services • e.g. Unix utility, browser protocol and handler • Key interfaces are published, or standardized (ISO, IEEE, etc); industry de-facto standards that bypass cumbersome official standardization procedures • Any component implementations must conform to the published standard. Challenge Issues

Openness: Unix • Openness is achieved by specifying and documenting the key sw interfaces • Unix features are fully accessible through system calls • add drivers • develop applications • include new features: IPC • Linux: the kernel is open too! Challenge Issues

Openness: Web Browser • Openness is achieved through a set of helpers or content handlers (pluggins) • Different data formats are decoded using different tools • E.g. .html/.gif/.jpeg/.pdf • Built-in content handler: extensible? • Built-in protocol handler: extensible? • protocol is a set of communication rules Challenge Issues

Transparency • Concealment from the user and the apps programmer of the separation of components, so that the system is perceived as a coherent system • Eight Forms of transparency (ANSA’89, ISO’92) • Access transparency: enable local and remote resources to be accessed using identical operations • Location transparency: enable resources to be accessed without knowledge of their location • Concurrency transparency: enable several processes to operate concurrently using shared resources without interference between them • Replication transparency: enable multiple instances of resources to be used to increase reliability and performance without knowledge of the replicas by users or appl. programmers Challenge Issues

Transparency (Cont’) • Eight Forms of Transparency (cont’) • Failure transparency: enable the concealment of faults, allowing users and appl. Programs to complete their tasks despite the failure of hw or swcomponents (e.g. email delivery) • Middleware generally converts the failures of networks and processes into programming-level exception • Mobility transparency: allow the movement of resources and clients within a system without affecting the operation of users or programs • Performance transparency: allow the system to be reconfigured to improve performance as loads vary • Scaling transparency: allow the system and application to expand in scale without change to the system structure or the application algorithms. Challenge Issues

Transparency Access transparency Location transparency Mobility transparency Failure transparency Replication transparency Concurrency transparency Performance transparency Scaling transparency Network Transparency Different forms of transparency in a distributed system; Full transparency is too costly and impossible in some situations Challenge Issues

Scalability: High Perf./Availability • Distributed systems operate effectively and efficiently at different scales of resources and users • Size, Geographical location, Administration • Objectives: • Control the cost of physical resource. E.g. if a single file server can support 20 users, 40 users for two servers? • Control the performance loss, independent of resource size? • Prevent sw resources running out. • E.g. 32-bit Internet address IPv4 and 128-bit Internet address IPV6. • Cost of scalability can’t be ignored: overhead of a scalable machine: Power, Fan, ... • Over-compensating for future growth may be worse than adapting to a change when we are forced to Challenge Issues

Scalability (Cont’) • Objectives (cont’) • Avoid performance bottleneck • Centralized vs decentralized organization Challenge Issues

Scaling Techniques • Hide communication latency • Asynchronous communication • Distribution • Naming • Replication • Cache • Consistency Challenge Issues

Scaling Tech. for Interactive App 1.4 • The difference between letting: • a server or • a client check forms as they are being filled Challenge Issues

Scalable Naming 1.5 An example of dividing the DNS name space into zones. Challenge Issues

Concurrency: High Perf./Availability: • More than one client want to access shared resource at the same time; the requests need be handled in parallel • Server-side concurrency • Server side operations: Database/mining, CGI • Servers on single CPU machines (Interleaving): • multiprogramming • Servers on symmetric multiple CPU machines • multiprogramming and multithreading • Servers on networks of workstations • Scalable server technology Challenge Issues

Concurrency (cont.) • Clients share load with server • Data compression/decompression • Data encryption/decryption • input verification, decoration, calculation • Java applet or JavaScript • Client-side version of JavaScript allows “executable content” to be included in web pages. • Do it in parallel! Challenge Issues

Failure Handling for High Availability • HW/SW failure is common. Challenge is how to deal with failures. • Failures in a distributed system are often partial. failure handling becomes even harder. • Service availability: server’s availability to provide uninterrupted services over the time; measured as the percentage of uptime • 99.9% availability equals to 8 hours 45 minutes of downtime per year Challenge Issues

Failure Handling • How to handle failures: • Failure detection: • Checksun is used to detect corrupted data in a message • How to detect a remote crashed server • Failure masking. E.g. Retransmit messages that are lost • Recovery from failure: • SW is designed in a way that the state of permanent data can be recovered or “rolled back” after a server has crashed. • Tolerate failure, by the use of redundant components Challenge Issues

Security • Security is a primary concern in an open distributed system • Secure system in three aspects: • Confidentiality (privacy): protection against disclosure to unauthorized individuals • Integrity: protect against alteration or corruption • Availability: protect against interference with the means to access the resources Challenge Issues

Challenge Issues: In Summary • Heterogeneity • Distribution transparency • Fault tolerance • Scalability • Concurrency • Openness • Security Challenge Issues

Challenge Issues in Distributed Systems