1 / 24

Thialfi: A Client Notification Service for Internet-Scale Applications

Learn about Thialfi, a scalable, fast, and reliable client notification service for internet-scale applications, ensuring fresh data for users. Discover its architecture, signal delivery approach, and recovery from failures.

tracygrimes
Download Presentation

Thialfi: A Client Notification Service for Internet-Scale Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thialfi: A Client Notification Servicefor Internet-Scale Applications Atul Adya, Gregory Cooper, Daniel Myers, Michael Piatek Google Seattle

  2. A Case for Notifications Problem: Ensuring cached data is fresh across users and devices

  3. Common Application Patterns • Clients poll to detect changes • Simple and reliable, but slow and inefficient • Push updates to the client • Fast but complex • Add backup polling to get reliability • Tail latencies can be high: masks bugs • Application-specific protocol  sacrifice reliability

  4. Our Solution: Thialfi • Scalable: tracks millions of clients and objects • Fast: notifies clients in less than a second • Reliable: even when entire data centers fail • Easy to use: deployed in Chrome Sync, Contacts, Google Plus

  5. Talk Outline • Thialfi’s abstraction: reliable signaling • Delivering notifications in the common case • Detecting and recovering from failures • Evaluation and experience

  6. Thialfi Overview Update X Client C2 Client C1 Register X Notify X Register Thialfi client library Update X Client Data center Register • Thialfi Service Application backend Notify X Notify X X: C1, C2

  7. Thialfi Abstraction • Objects have unique IDs and version numbers, monotonically increasing on every update • Delivery guarantee • Registered clients learn latest version number • Reliable signal only: cached object ID X at version Y

  8. Why Signal, Not Data? • Developers want reliable, in-order data delivery • Adds complexity to Thialfi and application, e.g., • Hard state, arbitrary buffering • Offline applications flooded with data on wakeup • For most applications, reliable signal is enough • Invoke polling path on signal: simplifies integration

  9. API Without Failure Recovery Register(objectId) Client Library Unregister(objectId) Notify(objectId, version) Thialfi Service Publish(objectId, version)

  10. Talk Outline • Thialfi’s abstraction: reliable signaling • Delivering notifications in the common case • Detecting and recovering from failures • Evaluation and experience

  11. Architecture Registrations, notifications, acknowledgments Client Client library Data center Registrar Client Bigtable Notifications Application Backend Object Bigtable Matcher • Matcher: Object ID  registered clients, version • Registrar: Client ID  registered objects, notifications

  12. Life of a Notification Client C2 x Ack: x, v7 C1: x, v7 Data center Client Bigtable Registrar Notify: x, v7 C2: x, v7 C1: x, v5 C2: x, C1: x, v7 C2: x, v7 x, v7 Publish(x, v7) Object Bigtable Matcher x: v5; C1, C2 x: v7; C1, C2 x: v7; C1, C2

  13. Talk Outline • Thialfi’s abstraction: reliable signaling • Delivering notifications in the common case • Detecting and recovering from failures • Evaluation and experience

  14. Possible Failures Client Library Client Store Server state loss/ schema migration Client restart Data center loss Client state loss Network failures Partial storage unavailability Client Bigtable Client Bigtable Registrar Registrar Object Bigtable Object Bigtable Matcher Matcher . . . Data center n Data center 1 Thialfi Service Publish Feed

  15. Failures Addressed by Thialfi • Client restart • Client state loss • Network failures • Partial storage unavailability • Server state loss / schema migration • Publish feed loss • Data center outage

  16. Main Principle: No Hard State • Thialfi remains correct even if all state is lost • All registrations • All object versions • Detect and reconstruct after failures using: • ReissueRegistrations()client event • Registration Sync Protocol • NotifyUnknown() client event

  17. Recovering Client Registrations ReissueRegistrations() x x y y Registrar Register(x); Register(y) Object Bigtable Matcher • ReissueRegistrations: Not a burden for applications • Application stores objects in its cache, or • Object list is implicit, e.g., bookmarks for user X

  18. Syncing Client Registrations Register: x, y Hash(x, y) x y x Hash(x, y) Reg sync Registrar y Object Bigtable Matcher • Goal: Keep client-registrar registration state in sync • Every message contains hash of registered objects • Registrar initiates protocol when detects out-of-sync • Allows simpler reasoning of registration state

  19. Recovering From Lost Versions • Versions may be lost, e.g. schema migration • Refreshing from backend requires tight coupling • Inform client with NotifyUnknown(objectId) • Client must refresh, regardless of its current state

  20. Talk Outline • Thialfi’s abstraction: reliable signaling • Delivering notifications in the common case • Detecting and recovering from failures • Evaluation and experience

  21. Notification Latency Breakdown Batching accounts for significant fraction of latency

  22. Thialfi Usage by Applications

  23. Some Lessons Learned • Add complexity at the server, not the client • Deploy at server: minutes. Upgrade clients: years+ • Asynchronous events, not callbacks • Spontaneous events occur: need to handle them • Initial applications have few objects per client • Earlier use of polling forces such a model

  24. Thialfi Summary • Fast, scalable notification service • Reliable even when data centers fail • Two key ideas simplify failure handling • Deliver a reliable signal, not data • No hard state: reconstruct after failure • Deployed in Chrome Sync, Contacts, Google+

More Related