310 likes | 443 Views
Topics in Reliable Distributed Systems 048961. Fall 2003-2004 Dr. Idit Keidar. Course Overview. Graduate level. Prerequisite: an introductory course on distributed computing. Please see me if you didn’t take one. Format: reading group & seminar.
E N D
Topics in Reliable Distributed Systems048961 Fall 2003-2004 Dr. Idit Keidar
Course Overview • Graduate level. • Prerequisite: an introductory course on distributed computing. • Please see me if you didn’t take one. • Format: reading group & seminar. • Covering “hot” topics in reliable distributed computing from recent research papers. • Discussion and evaluation of papers.
So What Are Those “Hot” Topics • Peer-to-peer systems. • Application-level multicast. • Overlay networks. • Gossip-based (epidemic) protocols. • Distributed file systems. • Security (e.g., of all the above). • Solutions of the above for wireless networks.
Requirements and Grading • Reading the papers (one a week). • Handing in short paper summaries – 15% • Participating in class discussions – 10% • Presenting one of the papers – 75% • Select a paper within the next 2 weeks.
Reading The Papers • This is a reading group. • This means that you should read each paper before it is being discussed. • Read the entire paper and be familiar with all its content. • Most will be conference papers. • You don’t need to understand everything, check previous work, or memorize details. • Hand in a short summary of the paper (unless you are presenting it) by e-mail to me the night before the lecture. • Any time before 8:00am the morning of the lecture is considered part of the night before.
Paper Summaries • Total of ½ a page to 1 page long (no more!!). • One paragraph overview • What question is the paper is trying to answer? • What are the main results? • What did you learn? • What questions remain unanswered? • What didn’t you understand? • Short discussion of the paper’s strengths and weaknesses.
Evaluating Paper Strengths and Weaknesses • Is the paper answering the “right” question? • Does it make reasonable assumptions? • How novel is the solution? • Is the solution technically sound? • How well is the solution evaluated? • Expected impact. (Hard to guess). • Writing level: is the paper clearly written? Is it self-contained?
Paper Presentations • You should fully understand the paper, be familiar with previous work, and be able to compare the paper with other similar work. • The presentation should include: • Summary and evaluation. • Comparison with other work. • List of topics to discuss in class. • It is highly recommended to discuss the presentation with me beforehand.
Contact Me • Idit Keidar <idish@ee> • Please send me e-mail with 048961 in the subject, and I’ll add you to the course mailing list. • Office hours: Tue 10:30-11:30 Mayer 960. • Let me know in the coming two weeks what you would like to present. • Schedule will be posted on the course web page.
Unicast, Broadcast, Multicast • Unicast – point-to-point communication. • Most network services focus on unicast. • Broadcast – sending to all hosts. • Common and efficient in radio networks, LANs. • Inappropriate for very large widely distributed networks (e.g., the Internet). • Multicast – sending to a selected group of hosts. • mcast(group, msg) – multicast a message. • deliver(msg) – deliver a message.
Multicast Groups • Hosts choose to be members of a group. • Group membership = set of group members. • Group membership is usually dynamic. • Nodes join and leave over time. • Messages should be delivered by the current members. • The term “current” is not very accurate in a distributed system.
Why Multicast? • Content distribution. • E.g., stock prices, live video broadcast, media-on-demand, web caching (Akamai). • Multi-user applications. • E.g., chat, multi-user games, on-line conferencing, and collaborative computing. • Replication of data/services for fault-tolerance. • Multicast to all replicas for state-machine replication. • Parallel programs running on clusters, Grid computing.
Multicast Characteristics • Number of sources – • Single source – point-to-multipoint. • E.g., content distribution (one or few sources). • Multiple sources – multipoint-to-multipoint. • E.g., chat and collaborative computing. • Types of Guarantees – • Best effort – like UDP. • Reliable. What exactly does this mean? • QoS - real time latency and guaranteed bandwidth.
IP Multicast • Best effort. • Extend hosts’ IP protocol stack to support multicast addressing. • Class D addresses. • Extend routers or add multicast routers. • Use hardware broadcast where available (for efficiency). • Minimize inter-LAN traffic by forwarding over gateways only once.
IGMP Internet Group Membership Protocol • Hosts can join and leave groups. • Multicast gateways keep track of which groups have subscribers in their subnet. • Use broadcast on the subnet.
Multicast Routing • Based on multicast trees. • Multiple protocols for maintaining the trees. • Source-based tree for a single active source. • Optimized for shortest paths, e.g., DVMRP (Distance Vector Multicast Routing Protocol). • Shared trees (core-based) for multiple senders. • Messages sent down-stream only. • Trees change as membership changes.
MBone – The Internet’s Multicast Backbone • Hosts and subnets supporting IP multicast. • Virtual topology covering a subset of the Internet with “islands” • Routing between islands in “tunnels”. • Virtual point-to-point links encapsulating multicast messages as IP over IP. • Incremental deployment of IP multicast in islands.
Status of IP Multicast • Gained popularity in the 90s. • 1992: 20 subnets on the MBone. • 1996: 2800 subnets on the MBone. • But did not “catch on” further. • Avoided by organizations fearing flooding of their networks with multicast traffic. • Now mostly unavailable.
Application-Level Multicast (ALM) • The new kid on the block. • A.k.a. End-host multicast. • Do-it-yourself multicast – • No network support or multicast routers. • Usually use unicast communication only. • Hosts organize themselves in a multicast group. • Fits well with the peer-to-peer philosophy: self-organizing dynamic systems.
Overlay Networks • A virtual structure imposed over the physical network (Internet). • Over the Internet, there is a (high-level) unicast link between every pair of hosts. • An overlay uses a fixed subset of these unicast links. • Most popular approach to ALM. • Many recent examples: Narada (end-host-multicast), Overcast, ScatterCast, ALMI, Scribe, Bayeux, Yoid, NICE, SelectCast, RelayCast, Jungle Monkey, I3, Bullet, SplitStream… • Some will be presented in this course.
Characteristics of ALM Overlays • Most multicast on a spanning tree. • Pros and cons? • Balanced trees are important. Why? • But many have extra links in the overlay for control, and for back-up when a spanning tree link fails. (E.g., Yoid, Overcast, SelectCast, HMTP). • Most are intended for single-source multicast. • Most provide best-effort reliability. • Replacing IP Multicast. • Some provide reliability via TCP links, buffering, loss detection, and retransmissions.
Reliable Multicast • In the “old days”, solutions based on IP Multicast. • RMTP (Reliable Multicast Transport Protocol) • Single source only. • SRM (Scalable Reliable Multicast) • Multiple sources. • Let the application decide what “reliability” is, determine policy for buffer management and retransmissions. • Not so scalable after all.
Group Communication Toolkits • Supporting strong reliability. • Virtual Synchrony model. • Addressing membership, reliability, flow control, message ordering, etc. • First in LAN only, later in WANs. • Example systems: Isis, Horus, Ensemble, Transis, Psync, Phoenix, Relacs, Newtop, Totem, NavTech, RMP, Spread, Xpand.
Virtual Synchrony Semantics[Birman, Joseph 87] • Group members all see events in same order • Events: messages, process crash/join. • Basic component: group membership • Reports changes in set of group members. • Each member knows of all the others. • Powerful abstraction for fault-tolerant “state-machine” replication. • Connected members go through same states. • New members get state transfer. • Inherently not scalable.
Gossip-Based Multicast • Spread information by gossiping about it with your friends. • Also called epidemic algorithms. • Another family of ALM systems, although often not thought of this way. • Randomized algorithms with probabilistic reliability guarantees.
Gossip-Based Multicast • Each node divides its time into gossip rounds. • In each round, the node exchanges information with F random nodes. • Push-based: send contents of buffer to them. • Pull-based: ask them for their buffer content. • Push and pull can be combined. • Optimization: send digest of message names, request missing ones. Pros and cons? • Upon receipt of a message (from other node or application), insert into message buffer. • Purge old messages from buffer (different policies).
Requirements and Guarantees(From the Math) • In order for gossip to “work”, each node sends each message O(logN) times. • F*(num rounds in buffer) is O(log N). • This requires each node to know at least O(logN) others (partial membership view). • Reliability very close to 100%. • Graceful degradation with increasing node crash and message loss rates. • Expected latency O(log N).
A Brief History of Gossip • First used for anti-entropy in maintaining consistency of databases of mobile users. • Demers et al. 1987. • Used for probabilistic reliable multicast (pbcast) over IP Multicast in Ensemble using complete group membership. • Recent work uses gossip by itself, without IP multicast, using its own membership. • E.g., Lightweight probabilistic broadcast (lpbcast). • Not only for multicast.
Preview • We’ll look closely at specific systems. • ALM overlays, reliable and best-effort. • Gossip-based algorithms. • We’ll also look at security considerations and mobility considerations of such systems. • We’ll put ALMs in the context of peer-to-peer computing.