570 likes | 721 Views
CDNs Content Outsourcing via Generalized Communities. Dimitrios Katsaros , Ph.D. @ Dept . of Computer & Communication Engineering, University of Thessaly @ Dept . of Informatics, Aristotle University. Heraklion, March 20 th , 2008. Outline of the talk. A summary of my research
E N D
CDNs Content Outsourcing via Generalized Communities Dimitrios Katsaros, Ph.D. @ Dept. of Computer & Communication Engineering, University of Thessaly @ Dept. of Informatics, Aristotle University Heraklion, March 20th, 2008
Outline of the talk • A summary of my research • Latest results: “CDNs Content Outsourcing via Generalized Communities” • (IEEE Transactions on Knowledge & Data Engineering) • PRIMITIVE: Community Identification • METHOD: Content Outsourcing for CDNs • GOAL: Access Latency Reduction & Robustness
INTELLIGENCE Research areas: Ultimately ??? Mobile/Pervasive Computing Web Pervasive Web Overlay Nets Caching & Air-Indexing Peer-to-Peer Networks Caching & Prefetching & Replication & Semistructured Data & Web views Webcasting Content Distribution Networks Location Tracking Ad Hoc Content-Based MIR Broadcasting & Data Dissemination Web Ranking & Search Engines Cooperative Caching & Sensor Node Clustering & Distributed Indexing & Coverage/Connectivity & Flash storage & Social Network Analysis Information Retrieval Sensors
Content Outsourcing • The problem: flash crowds • The solution: CDNs • Reactive vs proactive solutions • Community identification • The CiBC algorithm • Evaluation
A problem… • Feb 3, 2004: Google linked banner to “julia fractals” • Users clicking directed to Australian University web site • …University’s network link overloaded, web server taken down temporarily…
The problem strikes again! • Feb 4, 2004: Slashdot ran the story about Google • …Site taken down temporarily…again
The response from down under… • later…Paul Bourke asks: “They have hundreds (thousands?) of servers worldwide that distribute their traffic load. If even a small percentage of that traffic is directed to a single server … what chance does it have?” → Help him ←
Existing approaches • Client-side proxying • Squid, Summary Cache, hierarchical cache, CoDeeN, Squirrel, Backslash, PROOFS, … • Problem: Not 100% coverage • Throw money at the problem • Load-balanced servers, fast network connections • Problem: Can’t afford or don’t anticipate need • Content Distribution Networks (CDNs) • Akamai, Digital Island, Mirror Image, …
Origin Server End User End User End User End User End User End User From Internet Mazes to …
Stockholm Toronto London Seattle Amsterdam Boston Chicago New York Frankfurt San Jose Paris Denver Zurich WashingtonD.C. Los Angeles Tokyo Dallas Atlanta Hong Kong Singapore Miami Sydney Content distribution
First proposed @ IEEE JSAC’03, and What is described here today Coral X Akamai pull push Types of CDNs cooperative uncooperative
Cooperative push • What to push? • Frequently accessed content (IEEE JSAC’03) • Hard to predict what will be popular! • Popularity changes rapidly, too! • Request statistics? Reactive approach • Can we devise a proactive solution? • Where to store the pushed content? • Easy; a lot of replica placement algorithms
Web-site communities DO exist hollins.edu Antonis Sidiropoulos et al., WWW Journal, 11(1), 2008
“Hard” (max-flow) communities • COMMUNITY: a subset of the nodes of a graph, with the property that: (for each node of the community) The number of links to other nodes belonging to the community is larger than the number of links to nodes NOT belonging to the community
Generalized communities … • COMMUNITY: a subset of the nodes of a graph, with the property that: (for each node of the community) The sum of all degrees within the community is larger than the sum of all degrees toward the rest of graph
Social Network Analysis • A social network is a social structure to describe social relations (wikipedia) • History of Social Network is older than everybody who is here (more than 100 years – Cooley 1909, Durkheim 1893) [book: Stanley Wasserman & Katherine Faust] • Mathematical Representation • Structural & Locational Properties • Centrality • Betweenness centrality • Roles & Positions • Dyadic & Triadic Methods
Betweenness Centrality • σuw= σwu : number of shortest paths from uV towV (σuu=0) • σuw(v) : number ofshortest paths from u to w that some vertex vV lies on • Betweenness CentralityNI(v) of a vertex v is:
13 6 8 12 15 5 7 14 20 18 2 16 4 9 11 19 3 17 10 1 Y X T A U P V C B R W Q Betweenness Centrality in sample graphs
13 (0) 6 (0) 8 (26) 12 (0) 15 (0) 5 (0) 7 (156) 14 (233) 20 (0) 18 (97) 2 (0) 16 (131) 4 (96) 9 (0) 11 (0) 19 (0) 17 (1) 3 (68) 10 (0) 1 (0) Y (0) X (0) T (1,33) A (6,67) U (54) P (41) V (1,33) C (0) B (13) R (9,33) W (3,33) Q (8) Betweenness Centrality in sample graphs • Nodes with large NI: • Articulation nodes (in bridges), e.g., 3, 4, 7, 16, 18 • With large fanout, e.g., 14, 8, U
Betweenness centrality in … • [WEB] Performing graph clustering and recognizing communities in Web site graphs
CiBCMethod • Target: is true • CiBC method: • Building “cliques” and clusters around representative (pole) nodes (with low CB)
8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 1: NI Computation -O(nm) Phase 2: Initialization of cliques O(n)
8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 2: Initialization of cliques O(n)
8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 2: Initialization of cliques O(n)
8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 2: Initialization of cliques O(n)
8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 2: Initialization of cliques O(n)
8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities B Complexity: O(l2) l is the number of cliques C D
8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities B 4 3 C D
8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities B C
8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities B C
8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities Phase4: Check constraints C
Evaluation … Need for: • Web site graphs • CDN • Topology • Networking issues • Request streams • Roaming over the site graph Impossible to find real data for all these … • Simulators for each of them • To compensate for the lack of any of the above
Simulators • Web site graphs • Simulating the growth process of the Web • Request streams • Random surfer (following links + teleportation) • CDN • CDNSim (http://oswinds.csd.auth.gr/~cdnsim/)
Competing methods • Communities-based methods • Clique Percolation Method (CPM) • Correlation Clustering Communities identification method (C3i) • Simple Web Caching (LRU) • No CDN (only the origin server) • Full Replication
Metrics • Mean Response Time (MRT): the expected time for a request to be satisfied • Response time CDF: the Cumulative Distribution Function (CDF) denotes the probability of having response times lower or equal to a given response time • Replica Factor (RF): the percentage of the number of replica objects to the whole CDN infrastructure w.r.t. the total outsourced objects • Byte Hit Ratio (BHR) • Independent parameters • a) Surrogates’ cache size b) graph assortativity
Situations examined • Regular traffic • Network delay dominates the other components • Flash crowd event • TCP setup delay + network delay dominate the other components
Discussion • CDNs: industrial interest for them • Content outsourcing: significant issue • Proactive content outsourcing • Discovery of communities • Placement to surrogate servers • CiBC prevails
References Our work • D. Katsaros, G. Pallis, K. Stamos, A. Sidiropoulos, A. Vakali, Y. Manolopoulos. “CDNs Content Outsourcing via Generalized Communities”. IEEE Transactions on Knowledge and Data Engineering, 2008. State-of-the-art competing method • [CPM community identification method] G. Palla, I.Derenyi, I.Farkas, and T.Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814–818, 2005.
Thanks to my collaborators at A.U.Th Thank you for your attention! Questions?