1 / 59

Grapevine: An Exercise in Distributed Computing

This article discusses the naming and networking aspects of distributed computing, including low-level interfaces, translating hostname to IP address, ARP protocol, client-server architecture, replication, and Grapevine services.

goldent
Download Presentation

Grapevine: An Exercise in Distributed Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grapevine: An Exercise in Distributed Computing Landon Cox February 16, 2016

  2. Naming other computers • Low-level interface • Provide the destination MAC address • 00:13:20:2E:1B:ED • Middle-level interface • Provide the destination IP address • 152.3.140.183 • High-level interface • Provide the destination hostname • www.cs.duke.edu

  3. Translating hostname to IP addr • Hostname  IP address • Performed by Domain Name Service (DNS) • Used to be a central server • /etc/hosts at SRI • What’s wrong with this approach? • Doesn’t scale to the global Internet

  4. DNS • Centralized naming doesn’t scale • Server has to learn about all changes • Server has to answer all lookups • Instead, split up data • Use a hierarchical database • Hierarchy allows local management of changes • Hierarchy spreads lookup work across many computers

  5. Where is www.wikipidia.org?

  6. Example:linux.cs.duke.edu • nslookup in interactive mode

  7. Translating IP to MAC addrs • IP address  MAC address • Performed by ARP protocol within a LAN • How does a router know the MAC address of 152.3.140.183? • ARP (Address Resolution Protocol) • If it doesn’t know the mapping, broadcast through switch • “Whoever has this IP address, please tell me your MAC address” • Cache the mapping • “/sbin/arp” • Why is broadcasting over a LAN ok? • Number of computers connected to a switch is relatively small

  8. Broadcast on local networks • On wired ethernet switch • ARP requests/replies are broadcast • For the most part, IP communication is not broadcast (w/ caveats) • What about on a wireless network? • Everything is broadcast • Means hosts can see all unencrypted traffic • Why might this be dangerous? • Means any unencrypted traffic is visible to others • Open WiFi access points + non-SSL web requests and pages • Many sites send cookie credentials in the clear … • Use secure APs and SSL!

  9. High-level network overview Server Workstation Workstation Workstation Ethernet Gateway Workstation Gateway Ethernet Workstation Ethernet Workstation Workstation Server Workstation

  10. Client-server • Classic and convenient structure for distributed systems • How do clients and servers differ? • Servers have more physical resources (disk, RAM, etc.) • Servers are trusted by all clients • Why are servers more trustworthy? • Usually have better, more reliable hardware • Servers are better administered (paid staff watch over them) • Servers are kind of like the kernel of a distributed system • Centralized concentration of trust • Support coordinated activity of mutually distrusting clients

  11. Client-server • Why not put everything on one server? • Scalability problems (server becomes overloaded) • Availability problems (server becomes single point of failure) • Want to retain organizational control of some data (some distrust) • How do we address these issues? • Replicate servers • Place multiple copies of server in network • Allow clients to talk to any server with appropriate functionality • What are some drawbacks to replication? • Data consistency (need sensible answers from servers) • Resource discovery (which server should I talk to?)

  12. Client-server • Kernels are centralized too • Subject to availability, scalability problems • Does it make sense to replicate kernels? • Perhaps for multi-core machines • Assign a kernel to each core • Separate address spaces of each kernel • Coordinate actions via message passing • Multi-core starts to look a lot like a distributed system

  13. Grapevine services • Message delivery • Send data to specified users • Access control • Only allow specified users to access name • Resource discovery • Where can I find a printer? • Authentication • How do I know who I am talking to?

  14. Registration servers • What logical data structure is replicated? • The registry • RName Group entry | Individual entry • What does an RName look like? • Character string F.R • F is a name (individual or group) • R is a registry corresponding to a data partition • At what grain is registration data replicated? • Servers contain copies of whole registries • Individual server unlikely to have copy of all registries

  15. RNames Group {RName1, …, RNameN} RName name.registry Individual Authenticator (password), Inbox sites, Connect site What two entities are represented by an individual entry? Users and servers

  16. RNames Group {RName1, …, RNameN} RName name.registry Individual Authenticator (password), Inbox sites, Connect site How does an individual entry allow communication with a user? Inbox sites for users

  17. RNames Group {RName1, …, RNameN} RName name.registry Individual Authenticator (password), Inbox sites, Connect site How does an individual entry allow communication with a server? Connect site for servers

  18. Namespace • RNames provide a symbolic namespace • Similar to file-system hierarchy or DNS • Autonomous control of names within a registry • What is the most important part of the namespace? • *.gv (for Grapevine) • *.gv is replicated at every registration server • Who gets to define the other registries? • All other registries must have group entry under *.gv • Owners of *.gv have complete control over other registries • In what way do file systems and DNS operate similarly? • ICANN’s root DNS servers decide top-level domains • Root user controls root directory “/”

  19. Resource discovery • How do clients locate server replicas? • Get list of all registries via “gv.gv” • Find registry name for service (e.g., “ms”) • Lookup group ms.gv at registration server • ms.gv returns a list of available servers (e.g., *.ms) • At this point control is transferred to service • Service has autonomous control of its namespace • Service can define its own namespace conventions

  20. Implementing services • Mail servers are replicated • Any message server accepts any delivery request • All message servers can forward to others • An individual may have inboxes on many servers • How does a client identify a server to send a message? • Find well-known name “MailDrop.ms” in *.ms • MailDrop.ms maps to mail servers • Any mail server can accept a message • Mail servers forward message to servers hosting users’ inboxes • Note that the mail service makes “MailDrop.ms” special • Grapevine only defines semantics of *.gv • Grapevine delegates control of semantics of *.ms to mail service • Similar toimap.cs.duke.eduorwww.google.com

  21. Resource discovery • Bootstrapping resource discovery • Rely on lower-level methods • Broadcast to name lookup server on Ethernet • Broadcast to registration server on Ethernet • What data does the name lookup server store? • Simple string to Internet address mappings • Infrequently updated (minimal consistency issues) • Well-known GrapevineRServeraddrs of registration servers • What does this remind you of on today’s networks? • Dynamic host configuration protocol (DHCP) • Clients broadcast DHCP request on Ethernet • DHCP server (usually on gateway) responds with IP addr, DNS info

  22. Updating replicated servers • At some point need to update registration database • Want to add new machines • Want to reconfigure server locations • Why not require updates to be atomic at all servers? • Requires that most servers be accessible to even start • All kinds of reasons why this might not be true • Trans-Atlantic phone line might be down • Servers might be offline for maintenance • Servers might be offline due to failure • Instead embrace the chaos of eventual consistency • Might have transient differences between server state • Eventually everything will look the same (probably!)

  23. Updating the database • Information included in timestamps • Time + server address • Timestamps are guaranteed to be unique • Provides a total order on updates from a server • Does the entry itself need a timestamp (a version)? • Not really, can just compute as the max of item timestamps • Entry version is a convenient optimization Registration Entry List 1 Active items:{str1|t1, …, strn|tn} Deleted items:{str1|t1, …, strm|tm} List 2 Active items Deleted items

  24. Updating the database • Operations on an entries • Can add/delete items from lists • Can merge lists • Operations update item timestamps, modify list content Registration Entry List 1 Active items:{str1|t1, …, strn|tn} Deleted items:{str1|t1, …, strm|tm} List 2 Active items Deleted items

  25. Updating the database • How are updates propagated? • Asynchronously via the messaging service (i.e., *.ms) • Does not require all servers to be online • Updates can be buffered and ordered Registration Entry List 1 Active items:{str1|t1, …, strn|tn} Deleted items:{str1|t1, …, strm|tm} List 2 Active items Deleted items

  26. Updating the database • How fast is convergence? • Registration servers check their inbox every 30 seconds • If all are online, state will converge in ~30 seconds • If server is offline, may take longer Registration Entry List 1 Active items:{str1|t1, …, strn|tn} Deleted items:{str1|t1, …, strm|tm} List 2 Active items Deleted items

  27. Updating the database • What happens if two admins update concurrently? • “it is hard to predict which one of them will prevail.” • “acceptable“ because admins aren’t talking to each other • Anyone make sense of this? Registration Entry List 1 Active items:{str1|t1, …, strn|tn} Deleted items:{str1|t1, …, strm|tm} List 2 Active items Deleted items

  28. Updating the database • Why not just use a distributed lock? • What if a replica is offline during acquire, but reappears? • What if lock owner crashes? • What if lock maintainer crashes? Registration Entry List 1 Active items:{str1|t1, …, strn|tn} Deleted items:{str1|t1, …, strm|tm} List 2 Active items Deleted items

  29. Updating the database • What if clients get different answers from servers? • Clients just have to deal with it (•_•) ( •_•)>⌐■-■ (⌐■_■) • Inconsistencies are guaranteed to be transient • May not be good enough for some applications Registration Entry List 1 Active items:{str1|t1, …, strn|tn} Deleted items:{str1|t1, …, strm|tm} List 2 Active items Deleted items

  30. Updating the database • What happens if a change message is lost during prop.? • Could lead to permanent inconsistency • Periodic replica comparisons and mergers if needed • Not perfect since partitions can prevent propagation Registration Entry List 1 Active items:{str1|t1, …, strn|tn} Deleted items:{str1|t1, …, strm|tm} List 2 Active items Deleted items

  31. Updating the database • What happens if namespace is modified concurrently? • Use timestamps to pick a winner (last writer wins) • Why is this potentially dangerous? • Later update could be trapped in offline machine • Updates to first namespace accumulate • When offline machine goes online, all work to first is thrown out Registration Entry List 1 Active items:{str1|t1, …, strn|tn} Deleted items:{str1|t1, …, strm|tm} List 2 Active items Deleted items

  32. Updating the database • What was the solution? • “Shouldn’t happen in practice.” • Humans should coordinate out-of-band • Probably true, but a little unsatisfying Registration Entry List 1 Active items:{str1|t1, …, strn|tn} Deleted items:{str1|t1, …, strm|tm} List 2 Active items Deleted items

  33. Why read Grapevine? • Describes many fundamental problems • Performance and availability • Caching and replication • Consistency problems We still deal with many of these issues

  34. Keeping replicas consistent • Requirement: members of write set agree • Write request only returns if WS members agree • Problem: things fall apart • What do we do if something fails in the middle? • This is why we had multiple replicas in first place • Need agreement protocols that are robust to failures

  35. Two-phase commit • Two phases • Voting phase • Completion phase • During the voting phase • Coordinator proposes value to rest of group • Other replicas tentatively apply update, reply “yes” to coordinator • During the completion phase • Coordinator tallies votes • Success (entire group votes “yes”): coordinator sends “commit” message • Failure (some “no” votes or no reply): coordinator sends “abort” message • On success, group member commits update, sends “ack” to coordinator • On failure, group member aborts update, sends “ack” to coordinator • Coordinator aborts/applies update when all “acks” have been received

  36. Two-phase commit Phase 1 Replica Coordinator Replica Replica

  37. Two-phase commit Phase 1 Replica Propose: X 1 Coordinator Replica Propose: X 1 Propose: X 1 Replica

  38. Two-phase commit Phase 1 Replica X  1 Yes Coordinator Replica Yes X  1 Yes Replica X  1

  39. Two-phase commit Phase 2 Replica X  1 Coordinator Replica 3 Yes votes X  1 Replica X  1

  40. Two-phase commit Phase 2 Replica X  1 Commit: X 1 Coordinator Replica Commit: X 1 X  1 Commit: X 1 Replica X  1

  41. Two-phase commit Phase 2 Replica X  1 Coordinator Replica X  1 Replica X  1

  42. Two-phase commit Phase 2 Replica X  1 ACK Coordinator Replica ACK X  1 ACK Replica X  1

  43. Two-phase commit Phase 1 • What if fewer than 3 Yes votes? • Replicas will time out and assume update is aborted Replica X  1 No Coordinator Replica Yes 2 Yes votes X  1 Yes Replica X  1

  44. Two-phase commit Phase 1 • What if fewer than 3 Yes votes? • Replicas do not commit Replica X  1 Abort: X 1 Coordinator Replica Abort: X 1 2 Yes votes X  1 Abort: X 1 Replica X  1

  45. Two-phase commit Phase 1 • Why might replica vote No? • Replicas will time out and assume update is aborted Replica X  1 No Coordinator Replica Yes 2 Yes votes X  1 Yes Replica X  1

  46. Two-phase commit Phase 1 • Why might replica vote No? • Might not be able to acquire local write lock • Might be committing w/ another coord. Replica X  1 No Coordinator Replica Yes 2 Yes votes X  1 Yes Replica X  1

  47. Two-phase commit Phase 2 • What if coord. fails after vote msg, before decision msg? • Replicas will time out and assume update is aborted Replica X  1 Coordinator Replica 3 Yes votes X  1 Replica X  1

  48. Two-phase commit Phase 2 • What if coord. fails after vote msg, before decision msg? • Replicas will time out and assume update is aborted Replica X  1 Coordinator Replica 3 Yes votes X  1 Replica X  1

  49. Two-phase commit Phase 2 • What if coord. fails after decision messages are sent? • Replicas commit update Replica X  1 Commit: X 1 Coordinator Replica Commit: X 1 X  1 Commit: X 1 Replica X  1

More Related