Synchronizing Lustre file systems

Synchronizing Lustrefile systems Dénes Németh(nemeth.denes@iit.bme.hu) Balázs Fülöp (fulop.balazs@ik.bme.hu) Dr. János Török(torok@ik.bme.hu) Dr. Imre Szeberényi(szebi@iit.bme.hu)

The current state of art • Partially solved • Conventional local file systems • Off-line operation (rsync) • Problems • Walk through the directory structure • Have to know what will change (Inotify) • Does not work on distributed file systems • Scalability problems

The environment - Lustre • Distributed • Stripes (part of a file) on separate hosts • ~100-1000 clients (reading writing) • Redundant • File system and file metadata • Fault tolerance • Transaction driven operations • Rollback capability

Lustre – synchronization • Distributed • Hosts  absolute event sequencing • Is the time accurate enough? • Clients extreme efficiency • Redundant – Fault tolerance • Pulling the plug during synchronizing • Moving, tracking events • Rollback  synchronize to transactions

„inode” The basic Lustre concept Lustre Server Side Lustre Client Side Metadata Server failover Object Storage Targets ~100-1000

Events Moving the information - metadata Lustre Server Side Lustre Client Side Metadata Server Kernel space Lustre Metadata Access Local Event Sequencer Global Event Sequencer Event Reporter Event Multiplexer Event Processor Object Storage Targets ~100-1000

Events How-to move the information Metadata Server • Big difficulties • Sequencing = Accurate timing • Network delay • Delay from FS overload • Connection to all MDS • Can be a bottleneck • Just multiplexing events • No problems • No authorization, registration • (fix configuration) • Minimal network usage • Usually not a bottleneck • ER & EM can be deployed together or separately • Asynchrone notification • system calls: • Select (timeout) • Read,write (blocking) • Max 100.000 events/sec • Relative Complicated access • Easy access from user-space • Notifications through signals • Possibility for multiple reporters ProcFile System ProcFile System Block Device Block Device Local Event Sequencer Global Event Sequencer TCP/IP Network TCP/IP Network Event Reporter TCP/IP Network TCP/IP Network TCP/IP Network TCP/IP Network TCP/IP Network Event Multiplexer Event Processor

Accurate sequencing Linearly increasing output Number of local sequencers

Average sequence performance Server has enough threads - Performance OK - Constant QoS Server needs more threads - Performance DROPS - Why? ~ 5000 event/thread „Graceful degradation” Linear drop in performance

Resource usage on the global sequencer at most 2 ms in each second ~ 0

SFS 3 MDS OST MDS OST Event Reporter Committer Client Committer Client Event Processor Event Processor Committer Client Event Processor B A 4 4 3 3 A A B B How-to commit the changes SFS 1 SFS 2 MDS OST Event Multiplexer Event Multiplexer Event Reporter How-to execute „3” if „4” already happened? Unfortunately no real goodsolution

Event sequence error resolution • Ostrich politic • Drop all evens with conflicting sequence • Conflict detection • Is the event applicable? • In design stage … • Replaying the already committed events • Currently lack of Lustre support

Questions? Thank you for your Attention!

Synchronizing Lustre file systems

Synchronizing Lustre file systems

Presentation Transcript

File Systems

File Systems

File Systems

File Systems

Synchronizing

File-Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

Lustre File System Evaluation at FNAL

File Systems

Lustre File System Evaluation at FNAL

File Systems

File Systems

File Systems

File Systems

Synchronizing Panels