Weed File System

Weed File System Simple and highly scalable distributed file system (NoFS)

Project Objectives Yes Not Namespaces POSIX compliant • Store billions of files! • Serve the files fast!

Design Goals • Separate volume metadata from file metadata. • Each volume can optionally have several replicas. • Flexible volume placement control, several volume placement policy. • When saving data, use can specify replication factor, or desired replication policy

Challenges for common FS • POSIX costs space, inefficient • One folder can not store too many files • Need to generate a deep path • Reading one file requires visiting whole directory path, each may require one disk seek • Slow moving deep directories across computers

Challenges for HDFS • Stores large files, not lots of small files • Designed for streaming files, not on demand random access • Name node keeps all metadata (SPOF), bottleneck

How Weed-FS files are stored? • Files are stored into 32GB-sized volumes • Each volume server has multiple volumes • Master server tracks each volume’ location and free space • Master server generate unique keys, and direct clients to a volume server to store • Clients remember the fid.

Workflow

Master Node • Generate Unique Keys • Track volume status • <volume id, <url, free size>> • Maintained via heartbeat • Can restart

fid format • Sample File Key: • 3,01637037d6 • Each Key has 3 components: • Volume ID = 3 • File Key = 01 • File cookie = 637037d6(4bytes)

Volume Node • Keep several volumes • Each volume keep a map • Map<key, <offset, size>>

File Entry in Volume

Compared to HDFS HDFS WeedFS MasterNode only stores volume location MasterNode can be restarted fresh Easy to have multiple instances (TODO) • Namenode stores all file metadata • Namenode loss can not be tolerated

Serve File Fast • Each Volume Server maintains an map<key,<offset,size>> for each of its volumes. • No disk read for file metadata • Possibly read the file with one disk read, O(1) • Unless file is already in buffer, or • File on disk is not in one continuous block (Use XFS to store on continuous block)

Automatic Compression • Compress the data based on mime types • Transparent • Works with browser if accept gzip encoding

Volume Replica Placement • Each volume can have several replicas. • Flexible volume placement control, several volume placement policy. • When saving data, use can specify replication factor, or desired replication policy

Flexible Replica Placement Policy • No replication. • 1 replica on local rack • 1 replica on local data center, but different rack • 1 replica on a different data center • 2 replicas, first on local rack, random other server, second on local datacenter, random other rack. • 2 replicas, first on random other rack and same data center, second on different data center

Future work • Tools to manage the file system

Weed File System

Weed File System

Presentation Transcript

FILE SYSTEM

Chapter 4 File System —— File System Cache

Mirror File System A Multiple Server File System

File System

File-System

File System

FILE SYSTEM

File System

File System

File System

FILE SYSTEM FRAMEWORK —— virtual file system framework

File System

distributed file system and google file system

File System

File System