Introduction to DFS

Introduction to DFS

Distributed File Systems • A file system whose clients, servers and storage devices are dispersed among the machines of a distributed system • File system operations have to be carried out over the network • A good DFS should ensure transparency • Clients should have the look and feel of a conventional file system

Naming and Transparency • Mapping between the logical and physical objects • Location Transparency – Name and physical storage location have no relationship • Location independence – Name and physical storage are independent • Name need not be changed if physical location is changed • Location independent files are essentially logical data containers • Location transparency hides the association b/w names and physical storage

Naming Schemes • Combination of host name and local name • Local name is a path similar to Unix • Neither transparent nor independent • Attaching remote directories to the local directory • Popularized by Sun’s NFS • Appears as a coherent directory tree • Globally unique names • Truly transparent • Global naming structure spans all names • Difficult to achieve due to special files

Implementing Naming Schemes • Transparent naming requires mapping between names and their associated locations • Aggregating files into components for scalability and manageability • Hierarchical directory trees • Replication and caching • Maintaining consistency of cached view • Location independent file identifiers

Accessing Remote Files • Needs network data transfer • Remote service mechanism • Remote procedure call • Caching for improved performance

Caching • Idea is fetch once, use multiple times • If requested data is not available, get it from server • Store fetched data • Perform access on local data • Replace data when cache becomes full • One master copy at the server, several secondary copies at clients • Granularity – File blocks to entire file

Cache Location • Main memory • Workstations can be diskless • Faster access • Technology trends memory accesses becoming faster • Server caches will be in main memory – code reusability • Local disks • Reliability via persistence • Hybrid schemes • Best of both worlds

Cache Update Policy • Policy regarding when the modified data is reflected on the master copy • Can have significant impact on the performance • Write through policy • All writes are reflected immediately on the master copy • Blocking • Delayed writes • Write on flush • Periodic writes • Write on close

Ensuring consistency • Ensuring that data being read is consistent with master copy • Client initiated approach • Clients validates with server whether its data is up-to-date • Frequency of validation is the main issue • Check on first access • Check on every access • Periodic checking

Server Initiated Approaches • Server records the files each client is accessing • Detects potential inconsistency and notifies clients • Conflicts occur when at least 2 clients cache and one is writing • Invalidation/Update based mechanisms • Session semantics • Consistency enforced upon file closing • Unix semantics • Consistency enforced upon write

Why or Why not Caching • Locality of accesses • Gains in performance and scalability • Big chunks of data lead to lesser overheads • Disk accesses can be optimized for larger chunks of data • Consistency maintenance is the cost • Memory/disk space requirements at clients

Stateful vs. Stateless Servers • Stateful servers maintain information about files being accessed by clients • Clients are given connection ids, which acts as index into inode tables • Performance gains – Prefetching file blocks • Stateless servers maintain no state • Each request is self-contained • Reliability is the issue !!!

Introduction to DFS

Introduction to DFS

Presentation Transcript

DFS DFS DFT DFT DFT

Topological Sort: DFS

dfs

Design for Stamping (DFS)

Design for Safety (DfS)

DFS

Generic DFS and BFS

Introduction to Dfs

DFS/TPC Proposal

BFS and DFS

3y-DFS 53.9%

dfs

DFS JMeter Architecture

BFS and DFS