530 likes | 628 Views
This material is based upon work funded by the National Science Foundation under grant no. 9875177. Performance Evaluation of URL Routing for Content Distribution Networks. PhD defense by Zornitza Genova Prodanoff Committee Members: Dr. K. J. Christensen (Major Professor) Dr. M. Varanasi
E N D
This material is based upon work funded by the National Science Foundation under grant no. 9875177 Performance Evaluation of URL Routing for Content Distribution Networks PhD defense by Zornitza Genova Prodanoff Committee Members: Dr. K. J. Christensen (Major Professor) Dr. M. Varanasi Dr. R. Perez Dr. Chari Dr. Labrador ZGP001 (zphddef.ppt - 07/15/03)
Acknowledgements • I would like to thank: • My major professor Dr. Ken Christensen, • My committee: Dr. Varanasi, Dr. Perez, Dr. Chari, and Dr. Labrador • Dr. Suen for his comments at my proposal defense • My colleagues: K. Yoshigoe, A. Aslam, G. Perrera, and J. Shahbazian • My family ZGP002
New New New Topics • Motivation • Problem and contributions • URL Routing • Improvements to URL routing • Evaluation of URL signatures • Evaluation of hashing for URL routing • Summary • List of my publications ZGP003
Motivation “…2.5 Billion Hours Spent Waiting on the Web in 1998.” - John Roth, chief executive of Nortel Networks at Telecom '99 ZGP004
Problem and contributions • Problem: • Excessive delay in the Internet caused by the inability to efficiently access distributed content in the Web • My contributions: • 1) Architected a new URL router that uses HTTP redirection • Investigated new use of CRC32 for reducing the size of routing tables • Investigated a new self-adjusting hashing method for faster URL routing look-up • Performed the first queuing evaluation of hashing - effects of correlation discovered ZGP005
Topics • Motivation • Problem and contributions • URL Routing • Improvements to URL routing • Evaluation of URL signatures • Evaluation of hashing for URL routing • Summary • List of my publications ZGP006
URL routing • Next generation Internet - Content Distribution Networks • A CDN is an overlay network on the Internet • A CDN co-locates content throughout the world • CDNs are of a great commercial and research interest • $15 million in NSF funding for Web services research • Akamai is one major CDN provider ZGP007
Transparent cache Reverse cache Origin site Proxy cache Clients Internet Distributed server URL routingcontinued Global content distribution in a CDN http://214.29.2.15/page http://www.some.com/page http://334.249.2.8/page ZGP008
(1) URL router (2) URL routingcontinued • HTTP redirection in a CDN • (1) HTTP request and redirect • (2) HTTP re-request and response Reverse cache Origin site Proxy cache Clients Distributed server ZGP009
Routing Table URL 1 Loc 1 (state), loc 2 (state), … loc M1 (state) Loc 1 (state), loc 2 (state), … loc M2 (state) URL 2 … … Loc 1 (state), loc 2 (state), … loc MN(state) URL N URL routingcontinued Architecture of a new URL router One armed URL router HTTP requests and redirects Network links Layer 3 switch ZGP010
URL routingcontinued • Need to exchange routing tables (digesting) • Summary Cache [17] • Use Bloom filters to “merge” routing (hash) tables • Bloom filter is probabilistic and does not support updates • False positives if non-unique hashes • Results in a “routing collision” in the context of URLs ZGP011
URL routingcontinued • Need to do look-ups in routing tables • Why use hashing? • Build routing tables as hash tables for efficient look-up • Idea of self–adjusting hash • Most frequently used keys are closer to the head • If chained hashing: rearrange after key accesses • Transposition rule for lists [50], [7] • Move-to-front rule for lists [33] • Review of H1 hashing [74] • Self-adjusting by using transposition ZGP012
URL routingcontinued Chained resolution of hash table collision index chain key record r0 rn-1 k0 r0 0 k1 r1 r1 1 k2 r2 r2 2 The hashing collision at index 0 causes the chain to be created … … rs m-1 rn-1 kn-1 ZGP013
URL routingcontinued H1 and Simple hashing algorithms based on [37] • C1. [Create lists] Fori 0 to m-1 set LISTi NULL. • C2. [Hash] Set ih(KEY), j 0 • C3. [Is there a list?] If LISTi = NULL, go to C6. • C4. [Compare] • IfK = LISTi[j], terminate • C5. [Advance to next] • If LISTi[j] NULL, set jj+1 and go to step C4. • C6. [Insert new key] Set LISTi[j] KEY. • C4A. [Compare and transpose – H1 hashing] • IfK = LISTi[j] and j 0, swap LISTi[j] with LISTi[j-1] and terminate • Else terminate ZGP014
URL routingcontinued Now begin my contributions in digesting and hashing (and evaluation thereof) ZGP015
Topics • Motivation • Problem and contributions • URL routing • Improvements to URL routing • Evaluation of URL signatures • Evaluation of hashing for URL routing • Summary • List of my publications ZGP016
Improvements to URL routing • Open problems • Select best source based on state (and location of client) • Reduce the size of the routing table to update/share • Perform fast routing look-ups My problems ZGP027
Improvements to URL routing continued • My idea… • Use CRC32 for URL signatures • CRC32 circuitry is already part of an Ethernet adapter • Serial shift-register with wrapped XOR terms • Use to get CRC32 signatures for URL in HTTP request header • Need to calculate a CRC32 over a subfield [53] • The subfield is the URL in an HTTP request header ZGP018
Improvements to URL routing continued • Define the following, • P is CRC32 generator polynomial • Ai, i = 1, …, m is a polynomial (bit sequence) • We store in a table (for all possible M) the remainders… • , where M is length of subfield Packet header Subfield Rest of packet A0 A2 A1 ZGP019
Improvements to URL routing continued • We have the following, Returned by adapter - from CRC32 shift register What we want (CRC32 for subfield) ZGP020
Improvements to URL routing continued • For the following properties apply: ZGP021
Improvements to URL routing continued • Solve for RA2 as follows… • Let A3 be A0 shifted left M bits. • Then • and • . 32-bit multiply ZGP022
Improvements to URL routing continued • My idea… • Aggressive hashing to perform fast look-up • Self-adjusting chained collision resolution • Fast way to do hash table look-ups • Based on move-to-front rule for lists [33], [50] ZGP023
Improvements to URL routing continued • The new Aggressivehashing algorithm • C1. [Create lists] Fori 0 to m-1 set LISTi NULL. • C2. [Hash] Set ih(KEY), j 0 • C3. [Is there a list?] If LISTi = NULL, go to C6. • C4. [Compare] • IfK = LISTi[j], terminate • C5. [Advance to next] • If LISTi[j] NULL, set jj+1 and go to step C4. • C6. [Insert new key] Set LISTi[j] KEY. • C4B. [Compare and move-to-front – Aggressive hashing] • IfK = LISTi[j] and j 0 LISTi[j] TEMP, for k = 0 to j • LISTi[k] LISTi [k-1]. Terminate. • Else terminate. New
Topics • Motivation • Problem and contributions • URL routing • Improvements to URL routing • Evaluation of URL signatures • Evaluation of hashing for URL routing • Summary • List of my publications ZGP025
Evaluation of URL signatures • Evaluation done with trace-driven simulation • Response variables: • 1) Probability of false hits due to signature collisions • 2) CPU time required to generate URL signatures • 3) Reduction in processing and memory resources for URL look-up ZGP026
Evaluation of URL signatures continued Input data used in the evaluation: • Obtained lists of URLs from 9 cache and server HTTP logs • Access lists • URL lists • CRC32 lists • Unique URLs range from 70 to 2.5 million (1.5 to 146 MBytes) • Continuity of logs was in months • Full URL string or CRC32 signatures lists were built generated by me 2.1 GBytes of ASCII format raw data was used ZGP027
Access list name Access list name Number accesses Number accesses Number URLs Number URLs Mean URL length (B) Mean URL length (B) Full URL list size (bytes) Full URL list size (bytes) CRC32 list size (bytes) CRC32 list size (bytes) www.peak.org www.peak.org 16,374 16,374 70 70 23.93 23.93 1,675 1,675 280 280 SDMA SDMA 41,941 41,941 153 153 33.76 33.76 5,165 5,165 612 612 UVA UVA 318,899 318,899 45,816 45,816 44.91 44.91 2,057,625 2,057,625 183,264 183,264 NLANR NLANR 944,028 944,028 504,967 504,967 58.44 58.44 29,510,135 29,510,135 2,019,868 2,019,868 UC Berkeley UC Berkeley 1,791,349 1,791,349 149,344 149,344 41.87 41.87 6,253,716 6,253,716 597,376 597,376 mcs.net mcs.net 1,862,070 1,862,070 75,361 75,361 29.87 29.87 2,250,829 2,250,829 301,444 301,444 hyperreal.org hyperreal.org 4,080,590 4,080,590 86,338 86,338 89.17 89.17 7,698,337 7,698,337 345,352 345,352 CA*netII CA*netII CA*netII 4,642,861 4,642,861 4,642,861 2,552,045 2,552,045 2,552,045 57.83 57.83 57.83 147,573,556 147,573,556 147,573,556 10,208,184 10,208,184 10,208,184 USF CSEE USF CSEE USF CSEE 8,819,454 8,819,454 8,819,454 49,029 49,029 49,029 51.84 51.84 51.84 2,541,483 2,541,483 2,541,483 196,116 196,116 196,116 Evaluation of URL signatures continued Input data characteristics ZGP028
Evaluation of URL signatures continued • Experiments on the performance of CRC32 • Experiment #1: Number of CRC collisions was measured • CRC32 generated for each URL • Non-unique CRC32s counted • Experiment #2: Measured CPU time to generate CRC32 URL list • Software CRC generation (8-bit look-up coded in “C”) • Experiment #3: Measured CPU time required for look-up • All entries from access list were looked up in URL list • URL list is a Simple chained hash table ZGP029
Collisions Measured Calculated value Pr[collision] measured Access list name www.peak.org 0 0 0.0000000 SDMA 0 0 0.0000000 UVA 0 1 0.0000000 NLANR 68 59 0.0001347 UC Berkeley 2 5 0.0000134 mcs.net 0 1 0.0000000 hyperreal.org 2 2 0.0000463 CA*netII 1558 1516 0.0006105 USF CSEE 2 1 0.0000408 Evaluation of URL signatures continued Results for experiment #1 Measured and theoretical are close ZGP030
Access list Time for URL list Time for URL www.peak.org <10 millisec -- SDMA <10 -- UVA 40 0.8730 sec NLANR 460 0.9109 UC Berkeley 100 0.6695 mcs.net 40 0.5307 hyperreal.org 120 1.3897 CA*netII 2390 . 0.9368 USF CSEE 40 0.8158 Evaluation of URL signatures continued Results for experiment #2 Time per URL string is small ( sec) ZGP031
Evaluation of URL signatures continued Results for experiment #3 0.6 0.5 up time (sec) 0.4 - 0.3 Look 0.2 Full URL 0.1 CRC32 URL signatures 0 10 12 14 16 18 20 22 H value CRC32 URL signature is better ZGP032
Evaluation of URL signatures continued • Experiments for CRC32 vs. MD5-Bloom filter digesting • Experiment #1: Measured digest size and generation CPU time • MD5-Bloom filter • CRC32 • 32-bit checksum • Lempel-Ziv (LZ) compression (used pkzip25) • Experiment #2: Measured digest size and CPU time • MD5-Bloom • Experiment #3: Measured collisions • Control variable is URL length • MD5-Bloom vs. CRC32 • URL length is a maximum of 25, 30, …, 80 bytes ZGP033
Evaluation of URL signatures continued • Experiments for CRC32 vs. MD5-Bloom filter digesting (continued) • Experiment #4: Measured digest size of the hash chain method • Based on the number of components • Tree structure of 32 bits for a <depth, hash code> pair ZGP034
CA*net list CSE list Method (Load Factor) CPU time (sec) Size (Mbytes) Collisions (%) CPU time (sec) Size (Mbytes) Collisions (%) MD5-Bloom (8) 89.13 9.74 0.03 1.63 0.19 0.00 CRC32 16.22 9.74 0.03 0.27 0.19 0.00 32-bit checksum 14.85 9.74 0.71 0.24 0.19 0.22 LZ compression 17.35 16.43 0.00 0.23 0.25 0.00 MD5-Bloom (8) 89.13 9.74 0.03 1.63 0.19 0.00 MD5-Bloom (16) 92.37 19.47 0.00 1.71 0.37 0.00 MD5-Bloom (32) 97.40 38.94 0.00 1.84 0.75 0.00 Evaluation of URL signatures continued Results for experiments #1 and #2 Similar CRC32 and Bloom filter collisions ZGP035
Evaluation of URL signatures continued Results for experiment #3 0.10 0.01 MD5-Bloom Collisions (%) CRC32 0.00 25 35 45 55 65 75 URL length (bytes) Collisions are same for CRC32 and Bloom filter ZGP036
Evaluation of URL signatures continued • Results from experiment #4 • Hash chaining in an average of 212% larger digests than CRC32 Substantially larger then the other methods ZGP037
Evaluation of URL signatures continued • Discussion of results • CRC32 URL signatures reduce the size of URL lists and speed-up look-up in a hash table • Require less network bandwidth to transfer • Require less memory for storage in the URL router • For CRC32 the number of collisions was found to be small • CRC32 digests require less CPU and produce same collisions ZGP038
Topics • Motivation • Problem and contributions • URL routing • Improvements to URL routing • Evaluation of URL signatures • Evaluation of hashing for URL routing • Summary • List of my publications ZGP039
Evaluation of hashing for URL routing continued • Look-up time experiments: • Experiment #1: Effect of hash table size on look-up time (NASA access list) • Experiment #2: Effect of hash table size (in K )on look-up time (Clark.net access list) ZGP040
Evaluation of hashing for URL routing continued Hash table look-up time for experiment #1 60 50 Simple 40 30 Mean Look-up Time Aggressive 20 H1 10 0 8 9 10 11 12 13 Hash table Size (K) For dense hash tables Aggressive is better than H1 ZGP041
Evaluation of hashing for URL routing continued Hash table look-up time for experiment #2 40 30 Simple Mean Look-up Time 20 Aggressive 10 H1 0 8 9 10 11 12 13 K Similar to experiment #1 results ZGP042
Evaluation of hashing for URL routing continued • Evaluation model (single server queue): • Response variables: • mean queuing delay • drop in utilization Arrivals are URLs to be looked-up Server is a hash table look Queued URLs ZGP043
Evaluation of hashing for URL routing continued • Mean queue length experiments: • Experiment #1: Effect of hash table size (K) on queue length (L) for utilization U = 80% (Simple chain) and exponential arrivals • Experiment #2: Effect of burtiness (Tmax) on L for U = 80% (Simple chain) and K = 8 • Experiment #3: Effect of (Tmax) on L for U = 80% and K = 8 • Experiment #4: Effect of autocorrelation (unshuffled and shuffled ordering of requests) on L for U = 80% and K = 8 • Experiment #5: Effect of autocorrelation (unshuffled and shuffled ordering of requests) on L for U = 80% (Simple chain) and K = 8 ZGP044
Evaluation of hashing for URL routing continued Results for experiment #1 6 Simple 5 4 L 3 2 Aggressive 1 H1 0 8 9 10 11 12 13 K Self-adjusting methods show similar performance ZGP045
Evaluation of hashing for URL routing continued Results for experiment #2 40 Simple hashing - value range is 30 5500 to 34000 L 20 H1 10 Aggressive 0 50 100 250 500 750 1000 T max H1 shows faster increase in L ZGP046
Evaluation of hashing for URL routing continued Results for experiment #3 H1 120K 80K L 40K Aggressive Simple 0 50 100 250 500 750 1000 T max H1 has magnitudes worse queue length ZGP047
Algorithm unshuffled shuffled M/G/1 Simple 5.20 3.15 3.13 H1 29102.01 8.58 8.57 Aggressive 294.09 9.93 9.76 Evaluation of hashing for URL routing continued Results for experiment #4 H1 has magnitudes worse queue length ZGP048
Algorithm U unshuffled shuffled Simple 80.0% 5.20 3.15 H1 21.7 0.43 0.36 Aggressive 12.9 0.19 0.18 Evaluation of hashing for URL routing continued Results for experiment #5 ZGP049
Evaluation of hashing for URL routing continued • Discussion of results • Aggressive hashing improves upon H1 hashing • Modest look-up time improvement • Significant improvement from a queueing perspective • Queueing must be used for evaluating hashing algorithms • LRD in look-up time of H1 results in extreme queueing delay • Catastrophic effects on any application ZGP050