1 / 57

CDNs Content Outsourcing via Generalized Communities

CDNs Content Outsourcing via Generalized Communities. Dimitrios Katsaros , Ph.D. @ Dept . of Computer & Communication Engineering, University of Thessaly @ Dept . of Informatics, Aristotle University. Heraklion, March 20 th , 2008. Outline of the talk. A summary of my research

gali
Download Presentation

CDNs Content Outsourcing via Generalized Communities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CDNs Content Outsourcing via Generalized Communities Dimitrios Katsaros, Ph.D. @ Dept. of Computer & Communication Engineering, University of Thessaly @ Dept. of Informatics, Aristotle University Heraklion, March 20th, 2008

  2. Outline of the talk • A summary of my research • Latest results: “CDNs Content Outsourcing via Generalized Communities” • (IEEE Transactions on Knowledge & Data Engineering) • PRIMITIVE: Community Identification • METHOD: Content Outsourcing for CDNs • GOAL: Access Latency Reduction & Robustness

  3. INTELLIGENCE Research areas: Ultimately  ??? Mobile/Pervasive Computing Web Pervasive Web Overlay Nets Caching & Air-Indexing Peer-to-Peer Networks Caching & Prefetching & Replication & Semistructured Data & Web views Webcasting Content Distribution Networks Location Tracking Ad Hoc Content-Based MIR Broadcasting & Data Dissemination Web Ranking & Search Engines Cooperative Caching & Sensor Node Clustering & Distributed Indexing & Coverage/Connectivity & Flash storage & Social Network Analysis Information Retrieval Sensors

  4. Content Outsourcing • The problem: flash crowds • The solution: CDNs • Reactive vs proactive solutions • Community identification • The CiBC algorithm • Evaluation

  5. A problem… • Feb 3, 2004: Google linked banner to “julia fractals” • Users clicking directed to Australian University web site • …University’s network link overloaded, web server taken down temporarily…

  6. The problem strikes again! • Feb 4, 2004: Slashdot ran the story about Google • …Site taken down temporarily…again

  7. The response from down under… • later…Paul Bourke asks: “They have hundreds (thousands?) of servers worldwide that distribute their traffic load. If even a small percentage of that traffic is directed to a single server … what chance does it have?” → Help him ←

  8. Existing approaches • Client-side proxying • Squid, Summary Cache, hierarchical cache, CoDeeN, Squirrel, Backslash, PROOFS, … • Problem: Not 100% coverage • Throw money at the problem • Load-balanced servers, fast network connections • Problem: Can’t afford or don’t anticipate need • Content Distribution Networks (CDNs) • Akamai, Digital Island, Mirror Image, …

  9. Origin Server End User End User End User End User End User End User From Internet Mazes to …

  10. Stockholm Toronto London Seattle Amsterdam Boston Chicago New York Frankfurt San Jose Paris Denver Zurich WashingtonD.C. Los Angeles Tokyo Dallas Atlanta Hong Kong Singapore Miami Sydney Content distribution

  11. Content Distribution Network (CDNs)

  12. First proposed @ IEEE JSAC’03, and What is described here today Coral X Akamai pull push Types of CDNs cooperative uncooperative

  13. Comparison

  14. Cooperative push • What to push? • Frequently accessed content (IEEE JSAC’03) • Hard to predict what will be popular! • Popularity changes rapidly, too! • Request statistics? Reactive approach • Can we devise a proactive solution? • Where to store the pushed content? • Easy; a lot of replica placement algorithms

  15. Communities as “attractors”

  16. Web-site communities DO exist hollins.edu Antonis Sidiropoulos et al., WWW Journal, 11(1), 2008

  17. “Hard” (max-flow) communities • COMMUNITY: a subset of the nodes of a graph, with the property that: (for each node of the community) The number of links to other nodes belonging to the community is larger than the number of links to nodes NOT belonging to the community

  18. “Hard”, but inefficient

  19. Generalized communities … • COMMUNITY: a subset of the nodes of a graph, with the property that: (for each node of the community) The sum of all degrees within the community is larger than the sum of all degrees toward the rest of graph

  20. Social Network Analysis • A social network is a social structure to describe social relations (wikipedia) • History of Social Network is older than everybody who is here (more than 100 years – Cooley 1909, Durkheim 1893) [book: Stanley Wasserman & Katherine Faust] • Mathematical Representation • Structural & Locational Properties • Centrality • Betweenness centrality • Roles & Positions • Dyadic & Triadic Methods

  21. Betweenness Centrality • σuw= σwu : number of shortest paths from uV towV (σuu=0) • σuw(v) : number ofshortest paths from u to w that some vertex vV lies on • Betweenness CentralityNI(v) of a vertex v is:

  22. 13 6 8 12 15 5 7 14 20 18 2 16 4 9 11 19 3 17 10 1 Y X T A U P V C B R W Q Betweenness Centrality in sample graphs

  23. 13 (0) 6 (0) 8 (26) 12 (0) 15 (0) 5 (0) 7 (156) 14 (233) 20 (0) 18 (97) 2 (0) 16 (131) 4 (96) 9 (0) 11 (0) 19 (0) 17 (1) 3 (68) 10 (0) 1 (0) Y (0) X (0) T (1,33) A (6,67) U (54) P (41) V (1,33) C (0) B (13) R (9,33) W (3,33) Q (8) Betweenness Centrality in sample graphs • Nodes with large NI: • Articulation nodes (in bridges), e.g., 3, 4, 7, 16, 18 • With large fanout, e.g., 14, 8, U

  24. Betweenness centrality in … • [WEB] Performing graph clustering and recognizing communities in Web site graphs

  25. CiBCMethod • Target: is true • CiBC method: • Building “cliques” and clusters around representative (pole) nodes (with low CB)

  26. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 1: NI Computation -O(nm) Phase 2: Initialization of cliques O(n)

  27. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 2: Initialization of cliques O(n)

  28. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 2: Initialization of cliques O(n)

  29. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 2: Initialization of cliques O(n)

  30. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 2: Initialization of cliques O(n)

  31. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities B Complexity: O(l2) l is the number of cliques C D

  32. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities B 4 3 C D

  33. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities B C

  34. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities B C

  35. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities Phase4: Check constraints C

  36. Evaluation … Need for: • Web site graphs • CDN • Topology • Networking issues • Request streams • Roaming over the site graph Impossible to find real data for all these … • Simulators for each of them • To compensate for the lack of any of the above

  37. Simulators • Web site graphs • Simulating the growth process of the Web • Request streams • Random surfer (following links + teleportation) • CDN • CDNSim (http://oswinds.csd.auth.gr/~cdnsim/)

  38. Competing methods • Communities-based methods • Clique Percolation Method (CPM) • Correlation Clustering Communities identification method (C3i) • Simple Web Caching (LRU) • No CDN (only the origin server) • Full Replication

  39. Metrics • Mean Response Time (MRT): the expected time for a request to be satisfied • Response time CDF: the Cumulative Distribution Function (CDF) denotes the probability of having response times lower or equal to a given response time • Replica Factor (RF): the percentage of the number of replica objects to the whole CDN infrastructure w.r.t. the total outsourced objects • Byte Hit Ratio (BHR) • Independent parameters • a) Surrogates’ cache size b) graph assortativity

  40. Situations examined • Regular traffic • Network delay dominates the other components • Flash crowd event • TCP setup delay + network delay dominate the other components

  41. Regular traffic: MRT vs. comm. strength

  42. Regular traffic: BHR vs. comm. strength

  43. Regular traffic: MRT vs. cache size

  44. Surge of requests: CiBC

  45. Surge of requests: CPM

  46. Surge of requests: C3i

  47. Surge of requests: LRU

  48. Discussion • CDNs: industrial interest for them • Content outsourcing: significant issue • Proactive content outsourcing • Discovery of communities • Placement to surrogate servers • CiBC prevails

  49. References Our work • D. Katsaros, G. Pallis, K. Stamos, A. Sidiropoulos, A. Vakali, Y. Manolopoulos. “CDNs Content Outsourcing via Generalized Communities”. IEEE Transactions on Knowledge and Data Engineering, 2008. State-of-the-art competing method • [CPM community identification method] G. Palla, I.Derenyi, I.Farkas, and T.Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814–818, 2005.

  50. Thanks to my collaborators at A.U.Th Thank you for your attention! Questions?

More Related