210 likes | 374 Views
The Mystery of Cooperative Web Caching. Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces the delay in retrieving a document from the Internet by decreasing the number of request directed to it.
E N D
Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces the delay in retrieving a document from the Internet by decreasing the number of • request directed to it.
Cooperative Web Caching : it consists on a set of web caching located in a different places in the Internet and cooperate to each other to improve the performance of the system
The main entities in a cooperative web caching are:Proxy Router group proxies and router • Entities requirements: proxy: the proxy must acts as proxy cache Router must implement interior gateway – exterior gateway Group proxies- router the main requirement is the inter-cache communication. • The mystery of cooperative web caching is : the inter- cache communication technique
The inter – cache communication techniques There are many protocol proposed for the inter- cache communication for a cooperative web caching. • ICP Internet Cache protocol was proposed by Duane Wessels, K.Claffy 1997 • Cache digest was proposed by Alex Rousskov, Duane wessels 1998 • Summary Cache was proposed by Pei Cao in 1998 • HTCP Hyper Text Caching Protocol was proposed by P.Vixie, D.Wessels 2000 • CARP Cache Array Routing Protocol was proposed by Vinod Valloppillil , Keith W.Ross 1998
1.Internet Cache Protocol • ICP is a message format protocol when, each cache collects information about the existence of a particular web object in the cache of its neighbours by sending an ICP_query message . • The message is composed on fixed 20 octets header followed by a variable payload size.
0 8 16 32 The message Format • Opcode field ,8 bit, it is an integer number that indicates the state of the message : query- hit – miss- denied • Version field indicate the number of ICP version used • Message length = header length + payload length at maximum 16 Kbytes • Payload : that contains the URL of the requested document , to which is depend the payload length
Message Specification • A cache send an ICP_query ( Opcode= 1) to all its neighbours to collect information about a particular document. • The cache that receives the query extracts the URL of the document from the payload and sends a ICP response message ( Opcode =2 ,3). • The cache that generate the query collects all the responses and select the best one to send an HTTP request to retrieve the document. • There are two kinds of message hit- response: ICP_OP_HIT ICP_OP_HIT_Obj
Peer Selection :The selection of the best peer to retrieve the document can be done by selection algorithms based on the following parameters: • RTT measurement : that measure the congestion between two nodes . it is variable with the time. • Hop count : it is a constant measure .
2.Cache Digest Cache digest provides a mechanism for the communication among web caching. The digest contain a list of the URLs of the documents stored in the cache Digest Construction: • The URLs of the document stored in the cache are indexed in the digest by a keys ( set of bits ) stored in a bloom filter. • The keys are extracted from the URL by a number of hash functions that determines which bit must turn on and which must turn off. • a bit turn on if its state change from 0 to 1 • a bit turn off if its state change from 1 to 0
Bloom filter : • Is a hash coding method , proposed by Burton H.Bloom in 1970 • is based on the idea to reduce the hash area size that allows a small number of test to be falsely identified without increasing the reject time. • Reject time :is the time needed to classify that an element does not belong the set of elements stored in the hash. • The hash area is organised in N cells with N differences keys o…N-1 , the document must be codified in N bits . • Initially all the cells gas empty, all the bits are set of 0 , to insert an element it is necessary to generate a set of hash addresses a1…….ad all are set of 1. • To search an element , it is necessary to generate in the same way a set of hash addresses. If all are set of 1 that means the document is accepted and if any of these addresses are o that means the element is rejected
The calculation of the public keys • The URLs is transformed by the MD5 in a public key (128 bits) which is composed on two parts: a numeric part 1-7 bits , the second parts represent the transformation of the URL. • The hash function then, assign to each key an index extracted from the URL by doing the following computation : 1. Splitting the 128 bits in N parts 2. Finding the index to each part by calculating the modulo of the digest value to the digest size the digest size = (the number of bits for entry+ the public keys) cache capacity. 3. Combining the indices of each part to compose the index of the correspondent public key
Digest Accuracy: • The calculation of the public keys allows some possibility of errors . There are two kinds of errors: • 1. False miss • 2. False hit
Digest Requirement: • - The digest is a large data structure. 200MB-2MB needed to store all the URLs of the documents stored in the cache. • - It is necessary to do two copies of the digest one stored on the disk and the other in memory for the fast update. How does it work? - the cache exchange its own digest with its neighbours. - the cache digest message is composed on fixed 128 bytes in binary representation in the header which contain the digest specifications followed by the entire digest. - When a miss occurs in the local cache , it fetch in the other digest. - In the case of miss, the cache send an HTTP request to retrieve the document from the opportune location.
Conclusion • Cache digest eliminate the ICP_Query -response message used for the collection of the information about the requested document but, it requires a lot of memory to store it, and it transfers a large quantity of information over the network is proportioned with the size of the digest
3. Summary Cache • It is proposed by Pei Caoand group of their student to reduce the internal traffic created by ICP_Query . • Each proxy keeps a summary of the URLs of the document stored in each participating proxy. • It scale well , because it can employs a large number of proxies to reduce the web traffic. • Two main factors influence in the scalability : 1. Updating delay 2. Memory requirement
Updating delay: the summary is updated periodically or after a determined threshold of the documents is not reflected in the summary. • Memory requirement :is depend on the way to represent the summary. • The summary can be represented in the following way: • exact directory : it requires a lot of memory, for 100 proxies of 8GB cache and 1 million of documents with average URL length is 50 bytes the space needed to represent the summary is 2MB. • Server name: it reduces the summary size but, increase the possibility of error. • Bloom filter : is proposed by Pei Cao to reduce the memory requirement of the summary . The documents are stored in the filter in the same way as cache digest with a difference in the calculation of index when , the hash function doing the following computation: 1. 128 bits are divided in four 32bit word to each is extracted an index by the modulo on the summary size. 2. Each proxy maintains a counter C (l) for each location l
There are three kind of errors: • False hit • False miss • Remote hit stale
Comparison between the summary representation methods and ICP
4. Protocols comparisonComparison of the three previous protocols in term of network traffic