260 likes | 273 Views
Publiy + : A Peer-Assisted Publish/Subscribe Service for Timely Dissemination of Bulk Content. Reza Sherafat Hans-Arno Jacobsen University of Toronto ICDCS 2012 – Macau. http:// msrg.org / project / publiy. The Publish/Subscribe Model.
E N D
Publiy+: A Peer-Assisted Publish/Subscribe Service for Timely Dissemination ofBulk Content Reza Sherafat Hans-Arno Jacobsen University of Toronto ICDCS 2012 – Macau http://msrg.org/project/publiy
The Publish/Subscribe Model • Asynchronous event-driven messaging is widely used in building distributed systems • Sensor networks, e.g., traffic monitoring • Notification systems, e.g., distribution of news, social networks • Other applications, e.g., financial systems, online games • Events (publications) are small messages • A “change in state” of world objects but not entire object “state” itself • Event message size is in range of few bytes to 10s/100s of KBs • Pub/Sub allows event consumers to specify their interests using subscriptions and receive related events asynchronously as they are produced • Fast, near real-time delivery • Selective delivery: subscription matching semantics • Scalability aspects investigated: number of subscribers/subscriptions and publications ICDCS 2012
Another Dimension ofScalabilityfor Pub/Sub Videofiles Pictures • Content size of hundreds of MBs • Many application scenarios involving large content can take advantage of reactive pub/sub model • Traditionally content-dissemination is a receiver-initiated process • Content Dissemination Networks (CDN): Costly, requires provisioning • P2P file-sharing applications: Slow, potentially inefficient Software File synch. Socialnetworks P2P file sharing ICDCS 2012 Distribution of software updates
Publiy+in a Nutshell • Publiy is a Java-based pub/sub system developed at the University of Toronto: Supports conventional reliable and multi-path event forwarding • Publiy+ brings the benefits of event distribution to the world of content dissemination • Design goals: Selective delivery, timely delivery, and system scalability w.r.t. publication size • Based on a peer-assisted architecture to improve scalability, and lower maintenance costs • Elements of the system are deployed as part of the infrastructure but the majority of the effort is contributed by subscribers themselves • Other peer-assisted systems already deployed for music and video streaming, e.g., Spotify, Skype, etc ICDCS 2012
Software Patch Distribution:A Sample Scenario • End-user: “I want to get software updates for my browser” • Polling is one option • Periodically query for updates • If updates are available, start to download; otherwise, try later • Prone to flash crowd scenarios • Pub/Sub is an alternative • Clients register subscriptions: name=Firefox; version=3.6; OS=MacOSX • When an update is released, all interested clients download it: reactive delivery ICDCS 2012
Hybrid Architecture Metadata information Pub/Sub Broker Control layer Data layer Subscribe Subscribe Subscribe Subscribe Subscribe Subscribe Region Data messages Clients (publisher/subscriber) ICDCS 2012
Control Layer Home broker {X} {Z,Y} Descriptor {A,B} {A,B,X} Descriptor X Z Publish Y Descriptor Descriptor A S Subscriber can also contribute B ICDCS 2012
Data Layer Content Content Descriptor Linear coding Descriptor Segment i Segment i Segment i Content NetworkTransfer Publisher Block 1 Decode Decode Block 2 …. Coded blocks Block … Block k {A,B,…} Segmentation ICDCS 2012
Advantages of Network Coding Streamlines dissemination of blocks • Block sizes are small (10KB in Publiy+) • Clients can start to contribute as early has having received 1 coded block • Management and scheduling of blocks are simplified:Without network coding overhead is substantial • Clients receiving a segment can receive blocks from “any”other node that has “some” of the blocks • Coded blocks are equally useful Segment 1 Segment 1 Network transfer ??? ??? ??? ICDCS 2012
Dissemination Strategy toCombat Flash Crowds • Flash crowds can prolong the dissemination time • Traditional client/server designs are easily overwhelmed:More and more servers needed to handle traffic surge which is costly • Studies show that even P2P BitTorrent file sharing faces problems [Bharambe2006]: Some blocks of file become rare and delay download completion times • Reactive delivery using pub/sub is anultimate flash crowd scenario: All subscribers are already present in the system • Coordination done by brokers helps deal with flash crowd scenarios ICDCS 2012
Segments Dissemination Strategy Effective utilization of source’s bandwidth via delegation • First, upload segments to a small number of peers (from all regions) in PushList • Peers also receive similar PushLists and concurrently code/send blocks they receive to each other • Once a segment is served by source, all peers have the entire segment • Peers will be responsible to transfer segments to other nodes: This frees up bandwidth at source • Peers continue to send coded blocks within their region Source ICDCS 2012 Cluster of initial receivers
Evaluations • Platform: SciNet HPC computing cluster at University of Torontohttp://www.scinet.utoronto.ca/ • Each node (broker, source, or subscriber) is deployed on a separate CPU core • 2.66 GHz CPUsand Gigabit Ethernet • Uplink bandwidth is throttled (100-200 KB/s) • In allexperiments, system parametersare as follows:Number of blocks per segmentis 100 and blocksizeis10 KB:Segment size of 1 MB • Experimental setup • 1-5 Regions • 120, 300 or 1000 subscribers uniformly distributed among regions ICDCS 2012
Scalability w.r.t. Number of Subscribers Network setup: 300 and 1000 subscribers 1 source publishing 100 MB of content ICDCS 2012
Contribution of the source Contribution of subscribers Contribution of Peers Avg blocks transferred per segment: 136 blocks Avg uploaded blocks per subscriber: 102,000 coded blocks Network setup: • 1000 subscribers • 10 source publishes 100 MB of content (1GB in aggregate): totally 100,000 blocks are published ICDCS 2012
Within 1300 s download ends Upon release all clients start download Comparison With BitTorrent Experiment setup: 120 subscribers (capped uplink bandwidth at 200 KB/s) 1 source publishes 100 MB of content ICDCS 2012
[BT]: Within 1700 s downloads end [BT]: Polling intervalof 10 minutes Comparison With BitTorrent Experiment setup: 120 clients (capped uplink bandwidth at 200 KB/s) 1 source publishes 100 MB of content ICDCS 2012
[BT]: Within 1600 s downloads end Polling intervalof 2 seconds Comparison With BitTorrent Experiment setup: 120 clients (capped uplink bandwidth at 200 KB/s) 1 source publishes 100 MB of content ICDCS 2012
Conclusions • Selective and reactive dissemination using the pub/sub-style model is applicable to many application scenarios involving bulk content • Publiy+ enables scalable and timely dissemination of large published content using a hybrid coordinated peer-assisted architecture • Avoids high cost and performance bottlenecks of dedicated server farms, e.g., CDN • Overcomes the deficiencies of pure P2P systems, e.g., BitTorrent • Experimental evaluation results confirm scalability of the approach and advantages of using network coding techniques ICDCS 2012
Thank you! ICDCS 2012
1 TB of data Medium popularity Most popular Least popular Traffic Sharing Among Competing Contentwith Different Popularity Experiment setup: 5 regions and 1000 clients (capped uplink bandwidth at 200 KB/s) 15 sources (3 in each region) publish 100 MB Content has 1x, 2x, and 3x popularity ICDCS 2012
Traffic Sharing Among Competing Contentwith Uniform Popularity Experiment setup: 5 regions and 1000 clients (capped uplink bandwidth at 200 KB/s) 15 sources (3 in each region) publish 100 MB with uniform popularity ICDCS 2012
Content Serving Policy Network setup: 300 clients 1 source publishes 100 MB of content
Content Serving Policy Network setup: 300 clients 1 source publishes 100 MB of content
Impact of Packet Loss Network setup: 300 clients 1 source publishes 100 MB of content
Impact of source Fanout on dissemination time Network setup: 300 clients 1 source publishes 100 MB of content
Cross-regional traffic Regional traffic Effectiveness of Traffic Shaping Experiment setup: 5 regions and 1000 clients (capped uplink bandwidth at 200 KB/s) 1 sources publish 100 MB