140 likes | 305 Views
A Low-bandwidth Network File System. Athicha Muthitacharoen, Benjie Chen, and David Mazieres MIT Laboratory for Computer Science and NYU Department of Computer Science. Presented by: Khaled Elmeleegy. Overview. LBFS is a network file system designed for low-bandwidth networks.
E N D
A Low-bandwidth Network File System Athicha Muthitacharoen, Benjie Chen, and David Mazieres MIT Laboratory for Computer Science and NYU Department of Computer Science Presented by: Khaled Elmeleegy
Overview • LBFS is a network file system designed for low-bandwidth networks. • LBFS provides traditional file system semantics and consistency. • To reduce its bandwidth requirements, LBFS exploits cross-file similarities.
LBFS Design • Persistent file cache at the client. • For a modified file, the client must transmit the changes to the server. • Divides the files it stores into chunks and indexes the chunks by hash value. • Avoids transmitting the chunks the recipient already has.
Files Chunking and Indexing • Files are divided into non-overlapping chunks. • LBFS selects the boundary regions between chunks using Rabin fingerprints. • LBFS indexes the files’ chunks to recognize identical chunks.
Files Chunking and Indexing (cont’d) • If the client and server both have chunks producing the same SHA-1 hash, they are assumed to be the same chunk and avoid transferring it. 1. C1 C2 C3 C4 C5 C6 C7 2. C1 C2 C 8 C4 C5 C6 C7 3. C1 C2 C8 C4 C9 C10 C6 C7 4. C 11 C8 C4 C9 C10 C6 C7 Fig. Chunks of a file after various edits
File Consistency • Whenever a client makes any RPC on a file in LBFS, it gets back a read lease on the file. • When a user opens a file, if the lease on the file has not expired, then the open succeeds immediately with no messages sent to the server.
File Consistency (Cont’d) • If the lease has expired, then the client asks the server for the attributes of the file and implicitly is granted a lease on the file. • If the file is the same as when it was stored in the cache, then the client uses the version in the cache. • If the file is modified then the client must transfer the new contents from the server.
File Reads Client Server File not in cache GETHASH(..) Breaks up file into chunks hashes First hash not in DB READ(..) /*Chuck #1*/ Second hash not in DB READ(..) /*Chuck #2*/ Return first chunk Chunk #1 Put hash #1 in DB Return second chunk Put hash #2 in DB Chunk #2 File reconstructed , return to user. Fig. Reading a file using LBFS
File Writes • Atomic updates of files, using a temp file. • Incase of concurrent multiple file writes, the last writer to the file wins.
File Writes Client Server User closes file MKTMPFILE Break file into chunks and send their corresponding hashes CONDWRITE Create tmp file OK CONDWRITE First hash in DB,write data into tmp file OK Second hash not in DB Server has hash #1 HASHNOTFOUND Server needs hash #2, send data Put hash #2 into database, write data into tmp file TMPWRITE Server has everything, commit COMMITTMP OK No error, copy data from tmp file into target file OK File closed , return to user. Fig. writing a file using LBFS
Bandwidth Consumption Normalized bandwidth consumed by three workloads. The first four bars of each workload show upstream bandwidth, the second four downstream bandwidth.
Performance Vs Bandwidth/Latency Performance of the gcc workload over various bandwidths with a fixed round-trip time of 10 ms. Performance of the gcc workload over a range of round-trip times with fixed 1.5 Mbit/sec symmetric links.
Performance Vs Loss Rate Performance of a shortened edbenchmark over various loss rates, on a network with fixed 1.5 Mbit/sec symmetric links and a fixed round-trip time of 10 ms.