计算机网络与分布式系统. 北京大学计算机科学与技术系 王建勇 Email: jwang@net.cs.pku.edu.cn URL: HTTP://csnetlib.pku.edu.cn/~jwang/course/cnds.html. Why Do We Study Distributed Systems?. A dozen remaining IT problems proposed by James Gray:. 世界的“梅米克斯” ( world MEMEX) 虚拟现实 (TelePresence, i.e. VR) 无故障系统
Why Do We Study Distributed Systems? A dozen remaining IT problems proposed by James Gray: • 世界的“梅米克斯” • ( world MEMEX) • 虚拟现实 • (TelePresence, i.e. VR) • 无故障系统 • (trouble-free systems) • 安全系统(secure systems) • 高可用系统(AlwaysUp) • 自动程序设计 • (automatic programming) • 规模可伸缩性(scalability) • 通过图灵测试 (Turing test) • 语音到文本的转换 • (speech to text) • 文本到语音的转换 • (text to speech) • 机器视觉(machine vision) • 个人的“梅米克斯” • ( personal MEMEX)
教材: G. Coulouris, J. Dollimore, J. Kindberg. Distributed Systems: Concepts and Design. Addison-Wesley, 1994 参考书: Larry L. Peterson and Bruce S. Davie. Computer Networks: A System Approach, Morgan Kaufmann, 1996. Andrew S. Tanenbaum. Distributed Operating Systems. Prentice Hall International, Inc., 1996. Andrew S. Tanenbaum. Computer Networks. Prentice Hall International, Inc., 1996. Xueliang Yang. Distributed Computer Systems. Graduate school of USTC, the Chinese Academy of Sciences, P.R. China. 成绩考核: 1次编程(11月12日前完成提交), 1次文献阅读(论文提交的截止日期:2000/12/26),一次期末考试(占总成绩的60%).
Tentative course outline • Introduction, Basic networking, ISO model, networks & internetworking • Inter-Process communications: BSD sockets, Client-server model • RPC. Sun's RPC etc. • DOS principles. • Name service: terminology, DFS & DNS • Distributed file systems:concepts,design and implementation;DFS case studies: NFS, AFS, Coda, COSMOS(or S2FS).
Tentative course outline(continue) • Distributed shared memory: IVY & Munin • Coordination in Distributed System: potential causality, clock synchronization, logical time • Replication: Gossip & Isis • Transaction: Acid, Locks, deadlocks, nested transaction, optimistic concurrency control, timestamp ordering, distributed transaction • Recovery & Fault tolerance • Security in DS: DES,RSA, digital signature, Needham-Schroeder model, kerberos
Services provided by distributed systems name services distributed file systems distributed shared memory Time & coordination shared data services (distributed transactions & concurrency control, recovery) highly available services(replication & security fault tolerance) The micro-kernel of DOS Processes & threads, Naming & protection Communication & invocation, Virtual memory Foundations ofRemote Procedure Calling distributedInterprocess Communication systemNetworking and Internetworking Components of distributed system
Chapter 1 Introduction to Distributed Systems • Review of computing history • Why should we develop distributed system • Key characteristics of distributed system
1.1 Review of computing history • Physically distributed hardware • Logically centralized software 1.1.1 The trend of hardware 1960s & 1970s: timesharing system 1980s: personal computer & personal workstation 1990s: distributed computer systems 2000s: mass distributed systems 1.1.2 The need for logically centralized software • User’s requirement: • - a system built out of large numbers of powerful PCs or workstations • - but which act together in a coherent way • >> that is as easy to use & understand as an old fashioned timesharing system. • Role of a new generation operating system(DOS): - e.g., Web OS, Cluster OS
1.2 Why should we develop distributed system 1.2.1 Most important reason is that application is a starting point and end result of development of distributed systems. 1.2.2Many computer applications occur in a distributed or decentralized environment. • Sharing expensive resources • Exchange data between systems 1.2.3 Proliferation of low cost and high performance PCs or Workstations 1.2.4 The interface between users and Computers is more friendly 1.2.5 LAN & Internet applications stimulate DOS’s development • It’s the software, not the hardware that determines whether a system • is distributed or not 1.2.6 Examples of distributed systems and applications 1、Distributed UNIX: • Berkeley BSD UNIX+NFS+NIS • Amoeba, Mach, Chorus
2、Commercial applications • airline seat reservation and ticketing • automatic teller machine Reliability, security 3、Wide area network applications • Internet, ARPAnet 100=> 1 million • Internet information service,such as Email, Web, www search engine, BBS, E-commerce、digital library 4、Cluster system •IBM SP2 •Berkeley’s NOW •NCIC’s Dawning superserver 5、Meta Computing • idle computers are ubiquitous 6、Multimedia information access and conferencing application • continuous media service, such as VOD servers, video phone and video conference, their main requirement is quality of service • ATM, real time OS, continuous media servers
1.3 Key characteristics of distributed system What’s the Distributed System? Definition 1:A distributed system is one in which there exists a multip- licity of interconnected processing resources able to cooperate under system-wide control on a single problem with minimal reliance on centralized procedures, data or hardware. —Formulated by the organizing committee for the 1st conf. on DCS Definition 2: A Distributed system consists of a collection of autonomous computers linked by a computer network and equipped with distributed system software —From our textbook 1.3.1 Resource Decentralization and sharing - Some or all of the computing resources should be decentralized in function as well as distance - and this is a prerequisite for making the distinction from other types of systems, such as time-sharing system.
- Some resources are very expensive, and data sharing is an essential requirement in many computer applications 1.3.2 Cooperative Autonomy - Cooperative autonomy, especially control autonomy increases the overall reliability and availability of the system 1.3.3 Concurrency (i.e. work parallelism) - Concurrent vs Parallel、 >> MIMD(Multiple Instruction & Multiple Data stream) vs Concurrent of TSS - Two reasons: >> Many users simultaneously invoke commands or interact with applications programs; >> Many server processes run concurrently, each responding to different request from client processes. 1.3.4 System transparency - it looks to its users like a centralized single computer system - but runs on multiple independent machines , i.e. Single System Image.
ISO definition: • Access transparency, Location transparency, Concurrency transparency, • Replication transparency, Failure transparency, Migration transparency, • Performance transparency, Scaling transparency 1.3.5 Fault tolerance - Two approaches to the the design of fault-tolerant computer systems: >> hardware redundancy: the use of redundant components; >> software recovery: the design of programs to recover from faults. - Availability is a measure of the proportion of time that it is avail- able for use. 1.3.6 Scalability - Scalable techniques: >> Re-configurable , removing performance bottleneck{serverless, replicated data and services, caching} >> e.g., NFS is short of scalability. 1.3.7 Openness - the characteristic that determines whether the system can be extended in various ways. - e.g., UNIX
- To summarize: >> Open systems are characterized by the fact that their key interfaces are published; >> Open distributed systems are based on the provision of a uniform inter-process communication mechanism and published interfaces for access to shared resources; >> Open distributed systems can be constructed from heterogeneous hardware and software, possibly from different venders.
Chapter 2 Design Goals & Issues • Introduction • Basic technical issues • Users’ requirements • Summary
Performance • Reliability • Security • Scalability • Consistency Key characteristics of distributed system • Concurrency • Transparency • Fault tolerance • Scalability • Resource sharing • Openness Key design goals
2.1 Basic design issues • Naming: - global meaning & scalability • Communication: -how to optimize the implementation of communication in distributed system - while retaining a high-level programming model for its use • Software structure: - how to structure a system so that new services can be introduced >> that will interwork fully with existing services >> without duplicating existing service elements • Workload allocation: - how to deploy the processing and communication resources in a network to optimum effect in the processing of a changing workload • Consistency maintenance: - maintenance of consistency at reasonable cost
2.1.1 Naming • name vs identifier • resolved name is an identifier together with other attributes - internet communication: IP+PORT number - UNIX file system: index node number - Mach communication system: Port number • naming design considerations - choose an appropriate name space - use name service to resolve names to communication identifiers - scalability considerations • name contexts are represented by tables or databases - file system: /etc/a.out vs /usr/a.out - internet: www.cs.pku.edu.cn vs www.cs.tsinghua.edu.cn • names maybe structured or flat, readable or unreadable, location-independent or containing location clues • naming schemes can incorporate security mechanism - file systems’ directory
2.1.2 Communication • Communication between a pair of processes involves: - transfer of data & synchronization activity • Communication primitives: send & receive may be: - synchronous(i.e. blocking) or asynchronous(i.e. non-blocking) • Two communication patterns: - client-server model between pairs of processes - group multicast model between groups of cooperating processes Client-server Communication • it’s oriented towards service provision,and an exchange consists of: - transmission of a request from a client process to a server process; - execution of the request by the server; - transmission of a reply to the client. • it can be implemented in terms of message-passing operations(send & receive) - but commonly presented at the language level as RPC
Dynamic binding in client-server model • - example: DNS name server • Function shipping in client-server model • - example: Postscript with laser printers Group multicast - sending a message to the members of a specified group of processes is known as multicasting
Motivation of group multicasting • - Locating an object • - Fault-tolerance • - Multiple update • >> e.g., maintaining cache coherence under write-update mechanism • >> e.g., Time synchronization, RAID
components of DOS - operating system kernel services >> extending conventional Unix kernel, like BSD Unix >> microkernels, like Mach, Amoeba and Chorus - open services >> DFS >> DSM >> other services, like electronic mail delivery service - Support for distributed programming >> RPC >> MPI or PVM
2.1.4 Workload allocation Figure 2.5 the processor pool model • two main workload allocation model • - processor pool model, • - the use of idle workstations The processor pool model
examples: Amoeba, Plan 9, Cambridge Distributed Computing System Dawning 2000 super server Use of idle workstation • use of idle or under-utilized workstations as a fluctuating pool of extra computers • example: Sprite, LSF Shared-memory multiprocessors - also called Symmetric shared-memory Multi-Processor (or SMP)
2.1.5 Consistency maintenance • Update consistency - there are likely to be many users accessing shared data; - the operation of the system itself depends on the consistency of certain databases • Replication consistency • Cache coherency • - hypothesis of locality • Failure consistency • Clock consistency • User interface consistency
2.2 User requirements • Functionality - what the system should do for users • Reconfigurability - the need for a system to accommodate changes without causing disruption to existing service provision • Quality of service - embracing issues of performance, reliability and security 2.2.1 Functionality • Key benefits of a distributed computer system: • - economy & convenience from resource sharing; • - potential improvement in performance & reliability from • distributed resource.
Enhancements to the services provided by centralized computers: • - sharing across a network can bring access to a richer variety of resources • than could be provided by any single computer; • - utilization of the advantages of distribution enables explicit sharing, • fault-tolerant or parallel applications can be programmed. • Three options when considering a migration from centralized computing • to distributed computing: - adapt existing operating systems for networking >> example: BSD Unix + NFS - move to an entirely new operating system designed specifically for distributed systems - emulation: move to a new DOS, but can emulate one or more existing OS. >> examples: Mach & Chorus
2.2.2 Reconfigurability • Requirements of a reconfigurable distributed system: • - the changes due to the scalability of a distributed system design and its • ability to accommodate heterogeneity • - a failed process, computer or network component is replaced by another • working counterpart; • - computational load is shifted from over-loaded to less-loaded machines, • so as to increase the total throughput of the distributed system; 2.2.3 Quality of service • Performance: in terms of the response times experienced by its users • - Optimizing the performance of all of the software components that involved • >> OS’s communication services • >> distributed programming support ( e.g., RPC) • >> and the software that implements the service.
Reliability and availability: - a fault-tolerant system is one >> which can detect a fault >> either fail gracefully(that is, predictably) >> or mask the fault so that no failure is perceived by users of the system. • Security comes from two main threats - against the privacy and integrity of users’ data as it travels over the network - their openness to interference with system software: >> not all machines on a network can in general be made physically secure
In next class we’ll discuss: Chapter 3 Networking & Internetworking • Network technologies • Protocols • Technology case studies: Ethernet, Token Ring and ATM • Protocol case studies: Internet protocols and FLIP Thanks for your attention!