170 likes | 324 Views
Weed File System. Simple and highly scalable distributed file system ( NoFS ). Project Objectives. Yes. Not. Namespaces POSIX compliant. Store billions of files! Serve the files fast!. Design Goals. Separate volume metadata from file metadata.
E N D
Weed File System Simple and highly scalable distributed file system (NoFS)
Project Objectives Yes Not Namespaces POSIX compliant • Store billions of files! • Serve the files fast!
Design Goals • Separate volume metadata from file metadata. • Each volume can optionally have several replicas. • Flexible volume placement control, several volume placement policy. • When saving data, use can specify replication factor, or desired replication policy
Challenges for common FS • POSIX costs space, inefficient • One folder can not store too many files • Need to generate a deep path • Reading one file requires visiting whole directory path, each may require one disk seek • Slow moving deep directories across computers
Challenges for HDFS • Stores large files, not lots of small files • Designed for streaming files, not on demand random access • Name node keeps all metadata (SPOF), bottleneck
How Weed-FS files are stored? • Files are stored into 32GB-sized volumes • Each volume server has multiple volumes • Master server tracks each volume’ location and free space • Master server generate unique keys, and direct clients to a volume server to store • Clients remember the fid.
Master Node • Generate Unique Keys • Track volume status • <volume id, <url, free size>> • Maintained via heartbeat • Can restart
fid format • Sample File Key: • 3,01637037d6 • Each Key has 3 components: • Volume ID = 3 • File Key = 01 • File cookie = 637037d6(4bytes)
Volume Node • Keep several volumes • Each volume keep a map • Map<key, <offset, size>>
Compared to HDFS HDFS WeedFS MasterNode only stores volume location MasterNode can be restarted fresh Easy to have multiple instances (TODO) • Namenode stores all file metadata • Namenode loss can not be tolerated
Serve File Fast • Each Volume Server maintains an map<key,<offset,size>> for each of its volumes. • No disk read for file metadata • Possibly read the file with one disk read, O(1) • Unless file is already in buffer, or • File on disk is not in one continuous block (Use XFS to store on continuous block)
Automatic Compression • Compress the data based on mime types • Transparent • Works with browser if accept gzip encoding
Volume Replica Placement • Each volume can have several replicas. • Flexible volume placement control, several volume placement policy. • When saving data, use can specify replication factor, or desired replication policy
Flexible Replica Placement Policy • No replication. • 1 replica on local rack • 1 replica on local data center, but different rack • 1 replica on a different data center • 2 replicas, first on local rack, random other server, second on local datacenter, random other rack. • 2 replicas, first on random other rack and same data center, second on different data center
Future work • Tools to manage the file system