300 likes | 318 Views
This presentation explores FeedTree, an alternative RSS distribution architecture that uses peer-to-peer technology to reduce network load. The architecture enables timely distribution and updates of micronews, while reducing the burden on content providers.
E N D
FeedTree: Sharing Web Micronews with Peer-to-Peer Event Notification D. Sandler, A. Mislove,A. Post, P. Druschel Presented by: Andrew Sutton
Contributions • Propose alternative to RSS distribution architecture • Use peer-to-peer technology to reduce network load
RSS Distribution • RSS (Real Simple Syndication) - XML format for publishing micronews • Feed - a source of RSS items • Content Provider - responsible for publishing RSS feeds • Reader/Aggregator - user agent responsible for RSS acquisition and display
RSS Distribution Network • Readers poll content providers • Request RSS files every ~30 minutes • Readers can be online, requesting 24/7
Problems with Distribution • Polling - Requests occur on schedule • Superfluity - Full response per request • Stickiness - RSS traffic persists even if web traffic subsides • 24 Hour Traffic - requests occur all day long
Network Load Example • Updates occur every 30 minutes • Slashdot • Subscribers: > 17,000 • RSS file size: ~15KB • ~11.6GB/Day of RSS data • Difficult to measure accurately • No reliable statistics
Related Work • Improved Polling • Outsourced Aggregation
Improved Polling • Improved Polling • Restrict reader polling via RSS • Use HTTP caching to reduce superfluous responses • Use compress to reduce response size • Delta Encoding • Only transmit what’s changed [RFC 3229] • Seemingly ideal for RSS
Outsourced Aggregation • Content Providers supply RPC interface to aggregator • User readers query central server instead of providers
Outsourcing Problems • Central aggregator allows • Single point of failure for readers • Censorship of original content • Modification of original content (i.e., ads) • May not be reliable or trustworthy
FeedTree • Eliminate network/provider load • Uses peer-to-peer subscription • Use hybrid push/pull mechanism for timely distribution/update of micronews • Signed documents to enable trust
Pastry • Enables Peer-to-Peer networking applications • Self-organizing - nodes added, removed dynamically • Network overlay - efficiently routes messages in participating nodes • Applications: Scribe, SplitStream
Overlay Network • Logical network built on top of actual network • Can define virtual routes between nodes • Common approach for P2P networks
Pastry Network • Based on a circular namespace of node id’s (not tree-oriented) • Routing • Shortest-path based on routing • Non-receivers forward message to next-closest (proximity) node • Routes messages in O(logn) time
Scribe • Group Communication and Event Notification • Highly dynamic groups (based on topics) • Uses publish/subscribe model • Allows application-level multicast and anycast • Applications: FeedTree, ???
Scribe Multicast • Subscribing to a topic • Subscriber knows publisher’s node id • Sends “subscribe” message • Forwarding nodes become parents in the multi-cast tree (keeps track of children) • Notification of event • Events are multicast to all children of publisher, forwarders • One multicast tree per topic
FeedTree Distribution • Subscription • Readers subscribe to a feed (i.e., Scribe topic) • Publication • Each item is given timestamp, sequence id • Document is signed with publishers private key
FeedTree Delivery • Bootstrap Delivery • Signed RSS document is multicast to overlay network • Essentially, a combined subscribe/request operation • Incremental Delivery • Only new items are multicast • If no changes, multicast a “heartbeat”
Missed Deliveries • If reader is missing sequence numbers • Query parent for missing items • Nodes must buffer last n items to make re-delivery more efficient • If items still missing, query publisher
Network Overhead • Assume an RSS feed generating 4KB/hour • Interior node in tree with 16 children forwards < 20B/sec • However… • Unknown how this scales for large providers, large readers
Implementation • Implemented both publisher/reader software (proxies) • Created testbed website for real distribution of RSS feeds • No substantial experimentation http://www.feedtree.net
Advantages/Disadvantages • Benefits - lower cost of delivering micronews • (Significantly) reduced provider load • No fear of being RSS feeds being “slashdotted” • Differentiated services - different feeds for headlines/full news
Disadvantages • Requires specialized software for publishers/subscribers • P2P denial of service attacks • Malicious nodes may not forward events
Conclusions • End users receive better service than currently possible • Foresee new services based on RSS • Storing every single RSS item published on the internet • Anonymous feeds using anonymizing p2p routing algorithms • Cooperative multicast to distribute realtime media
Evaluation • Good • Appears to be well-reasoned idea • Developed software to test hypothesis • Good workshop paper • What’s needed for research • More detailed description of protocol • Substantiate claims about performance (i.e., experiment)
Questions • List four problems with the current RSS feed distribution model. • Which two of these four problems have the largest impact on network load?
Questions • How long does it take Pastry to route a message if there are n nodes in the network? • Suppose Slashdot has 50,000 RSS subscribers through FeedTree. What is the approximate depth of the multicast tree for the Slashdot topic?
Questions • Assume that there are 100,000 FeedTree topics on a Pastry network that all update at 4KB/Hour. An interior node with 16 children will send 20B/sec. Suppose an interior node participates in all feeds. What is the expected output (in B/sec) of this node?