240 likes | 455 Views
A Low-Bandwidth Network File System. A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, NYU. Key Ideas. A network file systems for slow or wide-area networks Exploits similarities between files or versions of the same file
E N D
A Low-Bandwidth Network File System A. Muthitacharoen, MIT B. Chen, MIT D. Mazieres, NYU
Key Ideas • A network file systems for slow or wide-area networks • Exploits similarities between files or versions of the same file • Avoids sending data that can be found in the server’s file system or the client’s cache • Also uses conventional compression and caching • Requires 90% less bandwidth than traditional network file systems
Working on slow networks • Make local copies • Must worry about update conflicts • Use remote login • Only for text-based applications • Use instead a LBFS • Better than remote login • Must deal with issues like auto-saves blocking the editor for the duration of transfer
LBFS • Exploits cross-file similarities especially with previous versions of the same file • Auto-save files, … • LBFS file server divides the files it stores into chunks and indexes the chunks by hash value • LBFS client similarly indexes a large persistent file cache • LBFS never transfers chunks that the recipient already has
Previous Work (I) • AFS Callbacks require server to notify clients when a cached file has been modified • Leases achieve same goal but have an expiration time • Coda supports slow networks and even disconnected operation • Defers some updates to saves bandwidth • OceanStore applies Bayou’s conflict resolution mechanisms to a file system
Previous Work (II) • Operation-based updates (Lee et al.) • Proxy-client close to the server duplicates client computations in the hope of duplicating its output files • Spring and Wetherall propose to use two large cooperating caches storing identical copies of the last n megabytes of network traffic • Rsync uses directory tree mirroring at client and server.
LBFS • LBFS provides close-to-open consistency • Similar to AFS session consistency • LBFS assumes clients will have a cache large enough to contain a user’s entire working set of files • When possible, LBFS reconstitutes files using chunks of existing data in the file system and client cache instead of transmitting those chunks over the network
Indexing Issues • Major challenge is keeping the index a reasonable size while dealing with shifting offsets • Indexing conventional file blocks would not work • Indexing and hashing overlapping file blocks at all offsets would require too much space
LBFS Solution • Considers only non-overlapping chunks of files • Sets chunk boundaries based on file contents to avoid sensitivity to shifting file offset • Examines every overlapping 48-byte region of the file to selects boundary regions, or breakpoints, using Rabin fingerprints • Expected chunk size is 8 KB plus the size of the 48-byte breakpoint window
More Indexing Issues • Pathological cases • Very small chunks • Sending hashes of chunks would consume as much bandwidth as just sending the file • Very large chunks • Cannot be sent in a single RPC • LBFS imposes minimum and maximum chuck sizes
The Chunk Database • Indexes each chunk by the first 64 bits of its SHA-1 hash • To avoid synchronization problems, LBFS always recomputes the SHA-1 hash of any data chunk before using it • Simplifies crash recovery • Recomputed SHA-1 values are also used to detect hash collisions in the database
Protocol • Based on NFS version 3 • Adds • Extensions to exploit inter-file commonality (GETHASH) • Leases • Compresses all traffic using conventional gzip
File Consistency (I) • Whenever a client makes any RPC on an LBFS file, it gets back a read lease on the file. • If a user opens a file whose lease has expired, the client asks the server for the attributes of the file • Grants the client a lease on the file. • Client can check if it has the current version of the file in its cache • If the file times have changed, client must obtain new contents of file from server
File Consistency (II) • No need for write leases • LBFS provides close-to-open consistency • Server never demands back a dirty file • If multiple clients are writing the same file,the last one to close the file will overwrite changes from the others • File updates are atomic • Limits damage caused by concurrent updates
Security Issues • LBFS uses SFS security infrastructure • Servers have public keys • Messages are encrypted • Specific security issue: • A user could check whether the file system contains a particular chunk of data by observing subtle timing differences in server’s answer to CONDWRITE request
Implementation (II) • Uses NFS • Two NFS-related issues • When server commits a temporary file to a target file, it must copy the contents of the temporary file onto the target file to preserve the target file i-node • Hard to preserve previous contents of a truncated file • Message order is guaranteed by TCP
Evaluation (I) • Communality of data in /usr/local
Evaluation (II) • Normalized bandwidth consumption(2 of 3 benchmarks)
Key • First four bars of each workload show upstream bandwidth, the second four downstream bandwidth. • CIFS is Windows natural network file system • “Leases+Gzip” uses LBFS file caching, leases, and data compression but not its chunking scheme • “LBFS, new DB” is LBFS starting with a a new database
Evaluation (III) Normalized application times
Key • Execution times weere normalized orma,ized execution times Measurements made over a cable modem link with 384 Kb/sc uplink and 1.5 Mb/s downlink • LAN data were obtained on a 100 Mb/s full-duplex LAN.
Conclusion • Under normal circumstances, LBFS consumes 90% less bandwidth than traditional file systems. • Makes transparent remote file access a viable and less frustrating alternative to running interactive programs on remote machines.