190 likes | 325 Views
Peer-to-Peer Supported Cache System for File Transfer. 2003.8.28 Joonbok Lee KAIST jblee@cosmos.kaist.ac.kr. Contents. Motivation Problem Statement Related Work Approach Simulation Conclusion Reference. 1. Motivation. KAIST Netflow Measurement (2002.10.4)
E N D
Peer-to-Peer SupportedCache System for File Transfer 2003.8.28 Joonbok Lee KAIST jblee@cosmos.kaist.ac.kr
Contents • Motivation • Problem Statement • Related Work • Approach • Simulation • Conclusion • Reference
1. Motivation • KAIST Netflow Measurement(2002.10.4) • Analyze the flow data of KAIST Border Router. 10MB Fig 1. The byte ratio in terms of Protocols Fig 2. Cumulative Distribution Function of the files transferred by FTP and HTTP. • Some Findings: • The amount of bandwidth consumed by FTP is similar with the one consumed by HTTP • 78% of the FTP traffic is due to the large files which is larger than 10MB. 1/17
2. Problem Statement • Unnegligible access to the large multimedia data. [Jung00] • FTP Traffic: • 17% of total traffic. • 78% of them are larger than 10MB. • 11% of them were failed during transfer. • The large files transferred by FTP generate much traffic,and many of them takes long time. • To solve this problem, we propose HTTP/FTP proxy cache which is scalable in terms of bandwidth and storage. 2/17
3. Related Work • The researches which solve large files’ transfer. • RepliCache: A New Approach to Scalable Networking Storage System for Large Objects [Jung97] • Proactive Web caching with cumulative prefetching [Jung00] • The researches which has scalable architecture. • Squirrel: A decentralized peer-to-peer web cache [Iyer02] • Peer-to-Peer Caching Scheme to Address Flash Crowds[Stading02] 3/17
4. Approach 4.1 Motivation 4.2 Cache with Peer-to-Peer Storage 4.3 Model 4.4 Detail Design 4/17
4.1 Motivation • Peer-to-Peer Architecture as a Cache • Scalability (bandwidth, computing power and storage) • Cost • Overhead (to find object and to persist system) • The Latency • One of the important metric of cache performance. • the lookup time + delivery time • Delivery time is depend on the file size. • Small files: the lookup time dominate Large files: the deliver time dominate 5/17
4.2 Cache with Peer-to-Peer Storage • Hybrid Approach • Scalability: peer-to-peer storage • Lookup and control: central cache. • Peer-to-Peer two-layer storage • The storage in central cache • Expected to be always available, low latency. • Store small files. • The second tier storages • can be unavailable. • Store large files. 6/17
HTTP/ FTP Server A HTTP/ FTP Server B 4.3 Model Os1 Os1 OL1 Connectivity Cloud Web Proxy Cache with FTP supporting module Os1 ,Os2 Local Area Network Peer-to-Peer Storage Peer n Peer 1 Peer 2 OL1 OL2 OL1 OL1 OS1,OS2: Small objectOL1, OL2: Large object Fig 3. Cache with two-layer storage 7/17
2 4 4 4.4 Detail Design • 2 new components to support FTP and large files. • Preserve transparency of File Location • FTP Cache Daemon • Store the state of FTP connection • Make the URL of files transferred by FTP • Check consistency. • P2P Storage Manager • Control its own storage. • Managed by object table in central cache. FTP/HTTP Server Control Data 3 4 Object Table StorageManager HTTP Cache Daemon FTP Cache Daemon 1 FTP/HTTP Client P2P Storage Manager FTP/HTTPClient P2P Storage Manager Fig 4. Control and Data connection between components 8/17
5. Simulation 5.1 Simulation Environment 5.2 Simulation Result 9/17
5.1 Simulation Environment • Trace • Requested FTP file list • Gather the FTP control (port 21) packet and produce the trace • 2002.10.23 ~ 2002.11.5 ( two weeks) • 76,880 (783GB)file requests. • 417 clients • Assumption • Local Network: 100Mbps • Simulated Caches • Cache A: 100GB Storage, 100Mbps • Cache B: Infinite Storage, 100Mbps • Cache C: Infinite Storage, Infinite Bandwidth • Cache D: Cache with Peer-to-Peer Storage 10/17
5.2 Simulation Result: Hit Ratio Fig 5. Cache Hit Ratio Fig 6. Outbound traffic • No strict storage control • Some peers may have same files in their storage • Even though some peers have available storage, the other peers can remove the file from their cache as a victim. • degrade the performance of storage, but not much. 11/17
5.2 Simulation Result: Latency Fig 7. Average latency of 95~105MB files Fig 8. Average latency of 95~105KB files Without the increase of small files’ latency, we can reduce the latency of large files. 12/17
5.2 Simulation Result:Cache Hit Ratio degradation by the peer failure 30% Fig 8. Cache hit ratio degradation by the peer failure 13/17
6. Conclusion • Shows that much amount of traffic is produced by FTP by the measurement. Among them,78% were occurred by the files larger than 10MB. • Propose the cache system which has two-layer storage using peer-to-peer architecture. It is transparent to the location of files. • Shows that two layer storage has good performance for the large files as well as small files using trace-driven simulation. • Can reduce the outbound traffic and latency by caching using our sistem. • Other issues • Collaboration between proposed systems. • Load balancing between peers. • Security problem. 15/17
7. Reference • Jaeyeon Jung, “RepliCache: Enhancing Web Caching Architecture with Replication of Large Objects” • Jaeyeon Jung, Dongman Lee and Kilnam Chon, "Proactive Web Caching with Cumulative Prefetching for Large Multimedia Data" , Computer Networks 33 (2000) pp. 645-655 • Sitaram Iyer, Ant Rowstron and Peter Druschel, “Squirrel: A decentralized peer-to-peer web cache” In Proceedings of the PODC ’02, Monterey, CA • Tyron Stading, Petros Maniatis, Mary Baker, “Peer-to-Peer Caching Schemes to Address Flash Crowds”, In Proceedings of the IPTPS ’02, MA, USA • Hyun-chul Kim, Joonbock Lee, Jungwon Suh, and Kilnam Chon, “Measurements of File-Systems Deployed on High-Performance Research and Education Networks”, Technical Report • I.Stoica , R. Morris, D. Karger, F.Kaas hoek, and H.Balakrishnan. Chord: A scalable content-addressable network. In Proceedings of the ACM SIGCOMM 2001 Technical Conference, San Diego, CA, USA, August 2001 • S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. “A scalable content-addressable network.” In Proceedings of the ACM SIGCOMM 2001 Technical Conference, San Diego, CA, USA, August 2001. 16/17
7. Reference • A. Rowstron and P. Druschel, "Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems". IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, pages 329-350, November, 2001. • Ian Clarke, Theodore W. Hong, Scott G. Miller, Oskar Sandberg, and Brandon Wiley, "Protecting Free Expression Online with Freenet," IEEE Internet Computing 6(1), January/February 2002. • William J. Bolosky, John R. Douceur, David Ely, and Marvin Theimer, Feasibility of a Serverless Distributed File System Deployed on an Existing Set of Desktop PCs In proceeding of SIGMETRICS 2000 • Internet RFC 959 File Transfer Protocol 17/17
Request File Appendix A HTTP Handle a request like web proxy cache Check Protocol FTP not cached Lookup Object Table cached inconsistent small Check Consistency Check File Size Large consistent central server Check Cached Location peer Opens data connection between server and client Open FTP control connections to both peer which has file and peer which requests is. Central cache opens data connection to client. Server opens data connection to central cache. Transfer file Central cache opens data connection to client. Transfer file Make FTP data connections between two the peers. Update Object Table Transfer file Transfer file Update Object Table Update Object Table