Self Stabilizing Distributed File System

Self Stabilizing Distributed File System Department of Computer Science, Ben-Gurion University A BGU – IBM joint project

DFS Motivation • Performance. • Fault tolerance, any server can take responsibility for any role. • Place files closer to users (local file access).

What Is Self-stabilizing? A self-stabilizing system is a system that can automatically recover following the occurrence of (transient) faults. The idea is to design system that can be started in an arbitrary state and still converge to a desired behavior. Self-Stabilization/S. Dolev

Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems. • Any on-going system must be Self stabilizing (or manually monitored). • Self-stabilizing algorithm can recover from any arbitrary state reached due to the occurrence of faults.

Design • File system replication servers are coordinated using a spanning tree. • Tree is constructed by self-stabilizing update algorithm using multicast messages. • Updates are propagated using self-stabilizing -synchronizer.

Design (Cont’) • Clients join the replication tree and forms a caching tree. • File leases are used to provide cache consistency.

Replication Tree • Using a layered self-stabilizing algorithm, we construct a single spanning tree consisting the file system servers.

Leader Election • A single leader coordinates the construction of the spanning tree. • If no leader exists, a server becomes a leader. • If more than one leader exist, the server with the minimal ID survives • Message are periodical sent using global multicast (or broadcast).

Leader Election Algorithm • Every T1 do: • If (p = leader) then send-multicast(‘I’m a leader’) • Leader-exists = true • Every T1+Td do: • If (not leader-exists) then leader = p • Leader-exists = false • Upon arrival of message do: • If (p.volume=volume) then • If (p=leader) then leader = min(leader,sender) • Else leader = sender • Leader-exists = true

Spanning Tree Construction • A network version of the self-stabilizing update algorithm. • Multicast messages with a limited -local TTL. • Define Neighboring relation for the update algorithm. • Keep the communication graph connected.

Induced Graph Example

Update Algorithm • Collect routing tables from all neighbors in the induced graph. • Build a distributed BFS spanning tree from the tables. • Select a manager (local leader) for the tree, a server with the minimal ID.

Tree Optimization • Update algorithm creates connected components for the communication graph that is induced by the  radius. • Goal: Find the minimal  radius that keeps connectivity. • Increase  by a factor of 2 until a single component spans the system. • Run a 2nd instance of update with < radius and compare outputs, if the same, decrease . • Search for  using binary search.

Tree Structure

Replication Consistency • A self-stabilizing -synchronizer verifies that the signatures of accessed files are identical in all servers. • If more than a single signature exist then there is a conflict. • The leader decides (user defined algorithm) on the correct file content and notifies the servers.

Caching Tree • Clients extends the replication tree to a caching tree. • The same update algorithm construct both replication and caching tree (minor modification are required).

Cache Tree Diagram

File Access • Read request is sent to the tree parent (either a server or cache). • Write request travels to the replication tree root (leader) and propagates by the -synchronizer. • Caching consistency depends on the propagation mechanism.

Read/Write Example

Linux Based bguFS (1) SyncDaemon: Cache manager & Server Application Network Communication User Level Kernel Level Upcalls bguFS Module Cache: valid data? VFS Local file system Updates Kernel update

Linux Based bguFS (2) SyncDaemon: Cache manager & Server Application Network Communication User Level Linux libc library Upcalls New implementation for “C” commands: fopen, fclose, fread, fwrite, etc … Library File Commands

Tasks • Leader election and a radius based spanning tree. • Optimal radius (binary) search and beta-synchronizer. • Distributed file R/W (operations) implementation. • Kernel VFS module (1). • C library “hacking” solution (2).

Self Stabilizing Distributed File System