1 / 27

FeedEx: Collaborative Exchange of News Feeds

FeedEx: Collaborative Exchange of News Feeds. Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006. Outline. One line comment Motivation/Problem Approach Analysis of feed publishing Challenges Experiments Critique. One line comment.

marnie
Download Presentation

FeedEx: Collaborative Exchange of News Feeds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FeedEx: Collaborative Exchange of News Feeds Seung Jun, Mustaque Ahamad Georgia Institute of Technology WWW 2006

  2. Outline • One line comment • Motivation/Problem • Approach • Analysis of feed publishing • Challenges • Experiments • Critique

  3. One line comment • Disseminate web feeds in a distributed (P2P) manner to increase scalability of web servers Traditional method P2P method RSS A B A B RSS reveals visitors to content providers RSS decoupled fetch operation from read

  4. Scalability Motivation & Problem • RSS/Atom feeds have become increasingly popular • Published by most traditional media and blogs • Feeding mechanism nyt.com http://nyt.com/../feed.xml HTTP response HTTP request … … Update page as contents are added RSS reader: Poll server to check updates

  5. Approach • The Approach • P2P overlay + gossip based protocol • P2P: Scalable growth in resources with service demand • Gossip: Scalable, Robustness (Join & Leave) • Feature of this overlay • Don’t have to guarantee delivery or delay • Challenges content searching ? Data dissemination Free riding prevention Fetching interval determination Overlay construction

  6. Analysis of Feed Publishing • Methodology • 245 popular feeds monitored for 10 days • Most popular feeds – information from Gmail’s web clips, Bloglines • Feeds fetched every 2 minutes • Measured.. • Publishing rate • Entry count in a feed • Entry lifetime

  7. Publishing Rate by Rank • Great difference between publishers • Partly zipf distribution

  8. Entry Count • High publish rate, More entry counts? – NO • Lifetime of entries are short  Entries can be lost with infrequent requests

  9. Publishing Rate by Time • 4 types of publishing patterns

  10. Challenges – Overlay Construction (1/2) – • Goal: Minimize network management overhead • Join • Well known host OR Contact previous neighbors • Share subscription set info • Update subscription set info to the network • Leave • Soft-state • Update subscription set periodically Gateway Neighbor list Subscription set

  11. Challenges – Overlay Construction (1/2) – • Neighbor selection • Many neighbors may incur overhead • Need to adapt to my resource status • select “useful” neighbors to me • Whose subscription set is similar to me A 1 direct, 1 one-hop, 1 two-hop B

  12. Challenges – Fetching interval determination – • Adaptive Fetching • Problem: Little hints about the publishing rate or entry lifetime • Frequent polling: overload servers, consume clients’ net bandwidth • Lazy polling: increase delay or miss entries • Adaptive Algorithm • Intuition: Frequent fetching  few new entries • Freshness rate: fraction of new entries in the fetched document • If Freshness rate < target freshness  Halve the fetching rate • If Freshness rate > target freshness  Double the fetching rate Entries in a feed HANI • Report 1 • Report 2 • Report 3 • … Fetch

  13. Challenges – Data dissemination– • Goal: Minimize bandwidth consumption • Limit the boundary of delivery • Forward only to matching neighbors (subscription set, hop_count)  reduce forwarding overhead • Reduce the unit of delivery • Unit of delivery : Entry bundle • A set of new entries (Filter out old entries)  Reduce redundant content delivery • Check before forwarding • Exchange id of an entry bundle (ID: SHA-1 digest of the bundle) • If it is an undelivered bundle  deliver it Max subset hops = 1 HANI Fetch

  14. Challenges – Free riding prevention– • Nodes may manifest selfish behavior • Only receive, without forwarding • Lie subscription set to become a preferred neighbor • Solution: Provide a neighbor evaluation method • Contribution metric • Nodes who forwards feeds I subscribe, and my near neighbors subscribe • Level of contribution: direct subscription, 1 hop subscription, 2 hop sub, … • cmi, j += wf−hf • Cut out unhelpful neighbors: I helped, but it doesn’t helped me • di,j = cmi,j − cmj,i • Feature • Uses local information only  Easy to implement and enforce the mechanism

  15. Challenges – Entry searching – • Overlay as a distributed storage • Iterative searching • Strong points: Searching latency, query traffic • Recursive searching (flooding) • Strong points: low overhead of a requester, caching for popular queries, reflect to neighbor evaluation ?

  16. Benefits of FeedEx • Scalability • Archivability • Storage of entries • Controllability • Compared to web based readers : e.g. Fetch interval • Filtering and recommendation • Share opinions on entries (e.g. voting) • Feed recommendation • Privacy • Users can fetch documents for others •  anonymize actual users

  17. Architecture of FeedEx • Prototpye: python • Networking: Twisted • Protocol : XML-RPC • Interoperability, fast-prototyping • Entry Storage: SQLite (Lightweight RDB) • RSS parser : feedparser.org

  18. Experimental Setup • Two modes • Stand-alone mode  SLN • FeedEx mode  XCH • Metrics • Time lag • Missing entries • Communication cost • Experiments • Use 189 PlanetLab nodes • Run 22 hours on a weekday • Primary factor: 6 fetching intervals • Let each node subscribe 20 out of 70 feeds

  19. Results: Time Lag • Average Time Lag • Average of node averages • Without applying adaptive fetching algorithm  Despite of fetching interval, contents are delivered soon 15.8times

  20. Results: Missing Entries • Rate of Missing entries • # enrtries in a node / # of entries in a reference node • Low missing rate • despite of a problem(DNS error or routing error) in the network • Sometimes better than the reference node

  21. Results: Communication Cost • Two most frequently called precedures: check_did, put_entries • Check_did call: single IP packet • Put_entries: 2 calls / minute  deliver 2.67 entries / call • Low communication cost

  22. Critique • Strong points • Made an new problem from an old domain “web caching” • Free from delay / failure of nodes • Draw out possible benefits/extensions • simple! • Practically deployable • Tried to find a mechanism both good for servers and clients

  23. Critique • Weak points • Overload due to RSS feed delivery? • Only a small text file delivery • Should have considered podcasting(Multimedia RSS) • Will the clients donate their resource? • Is “short delay” a strong incentive? • Is “low bandwidth consumption” a strong incentive? • Will the subscription sets of people really overlap a lot? • Net effective to SPs providing diverse RSS feeds • e.g. Naver blog, egloos.. • Is it really robust to frequent leave and join? • Lack of server side evaluation • Server load & network resource • Delivering critical data (e.g. timely news) using RSS?

  24. Supplementary slides

  25. Entry Lifetime • Generally CNN, • Publishers have policies (probably)

  26. Topic of interest (Maybe Tags?) feeds Topic based feed pub/sub (P2P based) Contents related to the topic feeds Web Content providers New idea • Topic based feed pub/sub system • Why should we register the address of a feed? • Need to find addresses providing contents I want • A feed may contain contents that I don’t want

  27. New idea • Topic based feeding services are already launched • Baebo • Create new feeds by keywords from the Amazon, Yahoo, eBay feeds • Say4 • Extract entries containing sentences in the bible from the BBC feed. • But centralized server runs the service • Limitation in the number of input feeds • Hard to add input feed dynamically compared to P2P approach

More Related