1 / 22

Corona: A High Performance Publish-Subscribe System for the World Wide Web

Corona is a topic-based publish-subscribe system that enables fast update detection and optimal bandwidth utilization. It improves on existing protocols by allowing cooperative polling and sharing of updates between peers, reducing update latencies. The system is easily layered on structured overlays and incorporates analytical modeling to optimize performance and network load.

penland
Download Presentation

Corona: A High Performance Publish-Subscribe System for the World Wide Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Corona: A High Performance Publish-Subscribe System for the World Wide Web CorONA: Cornell Online News Aggregator Authors: V. Ramasubramanian, R. Peterson and E.G. Sirer Cornell University Presenter: Sara Salahi Northwestern University

  2. Motivation • Abundance of frequently changing information on the Web: • Weblogs, wikis, news sites etc. • Increased need to notify users of updates • Ideally want: • Fast update detection • Optimal bandwidth utilization • Existing protocols do not provide users with automatic notification of updates

  3. Background • Publish-Subscribe Systems • Publishers, subscribers and infrastructure • Topic based vs. Content based • Fundamental drawbacks of preceding systems: • Require substantial changes in the way publishers serve content • Expect subscribers to learn sophisticated query languages • Non-compatible with current Web architecture

  4. Background • Micronews Systems • Micronews feeds: short descriptions of frequently updated information in XML-based formats (e.g. RSS) • Feed readers, cloud tag (pub-sub model) • Commercial services disseminate micronews updates to users • Main disadvantages: • Fragile centralized servers • Relentless polling to detect updates • Corona Improvements: • Shares updates between peers • Cooperative polling reduces update latencies

  5. Background • Overlay Networks • Large number of structured overlays that organize networks • Rings, hyperdimensional cubes, butterfly structures, de Bruijn graphs, skip-lists etc. • Corona is easily layered on structured overlays with uniform node degree (includes all of the above listed overlays)

  6. Corona: The Big Picture • Topic based pub-sub system which interoperates with current Web architecture (URLs = “channels”) • Cooperative polling of channels by geographically distributed nodes • “…n nodes polling with same polling interval and randomly distributed polling times can detect updates n times faster if they share updates with each other.” • Optimization problem • Tradeoff between update performance and network load

  7. Analytical Modeling • Pastry: underlying substrate, organizes network into a ring • Routing table, DAG rooted at each node • Node can reach another node in logbN hops, b: fanout, N: # of nodes • Corona assigns nodes in well-defined wedges • Optimal wedge size determined by analysis of global performance overhead tradeoff

  8. Analytical Modeling • Channel with polling level L • Polled by nodes with at least L matching prefix digits in their identifiers (polling level 0: all nodes in system poll for the channel) • Polling level quantifies performance-overhead tradeoff • Channel with polling level L has: • N/bL nodes polling it τ: polling interval • Cooperatively detects updates in (τ/2)(bL/N) time on average • Collective load placed on server of the channel is τ(N/bL)

  9. Analytical Models • Corona Lite • Minimize average update detection time • Bound load placed on content servers • Overall update performance = average of the update detection time of each channel weighted by # of clients subscribed to the channels • Target network load - the total # of subscriptions in the system • Corona Fast • Achieve target average update detection time • Minimize load placed on content servers • Maintains stable performance through changes in workload • Corona Fair • Minimize average update detection time w.r.t. expected update frequency • Bound load on content servers • Incorporates update rate of channels into tradeoff to achieve a fairer distribution of update performance between channels • Defines a modified update performance metric as the ratio of the update detection time and the polling interval of the channel

  10. Decentralized Optimization • Honeycomb – determines optimal polling levels • fi(l) and gi(l) define performance & cost for channel i as function of polling level l • NP-Hard so approximate solution • Lagrange multiplier: • Due to monotonicity, optimal solution L* is bounded by same minima as approximated solutions Ld* and Lu* • Honeycomb aggregates global tradeoff factors • Channels grouped in tradeoff clusters, fi/gi • # clusters/polling level is limited by a constant (Tradeoff_Bins) • Cluster aggregation overhead (memory state, network bandwidth) limited by size of routing table

  11. System Management • Channel has unique identifier and one or more owner nodes managing it • Primary owner is Corona node with numerically closest identifier to channel’s identifier • Additional owners are F closest neighbors • Tolerate failures • Like all P2P systems, problem occurs if more than F adjacent nodes fail at once • Fixed because users can easily re-subscribe • Owners inform subscribers of updates and keep track of channel-specific factors that affect performance tradeoffs

  12. System Management • Cooperative Polling • Optimization Phase • Corona nodes apply optimization algorithm on tradeoff data • Maintenance Phase • Changes to polling levels communicated to peer nodes in routing table via maintenance messages • Aggregation Phase • Enables nodes to receive new aggregates of tradeoff factors • Polls for a channel at different nodes are randomly distributed over time

  13. Update Dissemination • Version numbers • Deltas • Studies show that amount of change in content update is typically tiny – 6.8% • Difference engine used to identify new information • When delta is generated by a node, all other nodes in channel’s polling wedge are updated • “Simultaneously” detected deltas • Primary owner makes sure latest delta is used and ignores redundant deltas

  14. User-Interface http://www.cs.cornell.edu/people/egs/beehive/corona/

  15. Implementation • Layered on Pastry • Corona handles orphan channels • Tradeoff factors are aggregated into slack cluster prior to optimization • Reliance on IM • Can’t log in from all nodes simultaneously • Prevent malicious nodes from generating spurious updates • Publish digitally signed certificates • Use threshold-cryptography to generate certificate for content

  16. Evaluation • Compare Corona performance against legacy RSS performance • Real-life RSS traces are used • The tradeoff parameters are extrapolated to a larger scale: • 1024 nodes • 100,000 channels • 5,000,000 subscribers • Polling interval – 30 minutes

  17. Evaluation Network load on Content Servers Number of Pollers per Channel Average Update Detection Time Update Detection Time per Channel

  18. Evaluation Update Detection Time per Channel Update Detection Time per Channel OVERALL SUMMARY

  19. Deployment • A set of 60 PlanetLab nodes • Corona-Lite scheme is used • 7500 RSS feeds from www.syndic8.com • 150,000 subscriptions • Polling interval – 30 minutes

  20. Deployment Results Average Update Detection Time Total Polling Load on Servers

  21. Conclusions/Future Work • Corona is a topic based pub-sub system which interoperates with current Web architecture, network overlays • Fast update detection time achieved by: • Cooperative polling of channels by geographically distributed nodes • Shared updates between peers • Do all updates need to be shared? • Measure average time to deliver updates to subscribers? • Maybe optimize polling interval time depending on rate of updates in channel? • Need to run better simulation with IM interface to see true overhead of having multiple nodes logged in at once

  22. Thank you!

More Related