130 likes | 283 Views
Synchronizing Lustre file systems. Dénes Németh ( nemeth.denes@iit.bme.hu ) Balázs Fülöp ( fulop.balazs@ik.bme.hu ) Dr. János Török ( torok@ik.bme.hu ) Dr. Imre Szeberényi ( szebi@iit.bme.hu ). The current state of art. Partially solved Conventional local file systems
E N D
Synchronizing Lustrefile systems Dénes Németh(nemeth.denes@iit.bme.hu) Balázs Fülöp (fulop.balazs@ik.bme.hu) Dr. János Török(torok@ik.bme.hu) Dr. Imre Szeberényi(szebi@iit.bme.hu)
The current state of art • Partially solved • Conventional local file systems • Off-line operation (rsync) • Problems • Walk through the directory structure • Have to know what will change (Inotify) • Does not work on distributed file systems • Scalability problems
The environment - Lustre • Distributed • Stripes (part of a file) on separate hosts • ~100-1000 clients (reading writing) • Redundant • File system and file metadata • Fault tolerance • Transaction driven operations • Rollback capability
Lustre – synchronization • Distributed • Hosts absolute event sequencing • Is the time accurate enough? • Clients extreme efficiency • Redundant – Fault tolerance • Pulling the plug during synchronizing • Moving, tracking events • Rollback synchronize to transactions
„inode” The basic Lustre concept Lustre Server Side Lustre Client Side Metadata Server failover Object Storage Targets ~100-1000
Events Moving the information - metadata Lustre Server Side Lustre Client Side Metadata Server Kernel space Lustre Metadata Access Local Event Sequencer Global Event Sequencer Event Reporter Event Multiplexer Event Processor Object Storage Targets ~100-1000
Events How-to move the information Metadata Server • Big difficulties • Sequencing = Accurate timing • Network delay • Delay from FS overload • Connection to all MDS • Can be a bottleneck • Just multiplexing events • No problems • No authorization, registration • (fix configuration) • Minimal network usage • Usually not a bottleneck • ER & EM can be deployed together or separately • Asynchrone notification • system calls: • Select (timeout) • Read,write (blocking) • Max 100.000 events/sec • Relative Complicated access • Easy access from user-space • Notifications through signals • Possibility for multiple reporters ProcFile System ProcFile System Block Device Block Device Local Event Sequencer Global Event Sequencer TCP/IP Network TCP/IP Network Event Reporter TCP/IP Network TCP/IP Network TCP/IP Network TCP/IP Network TCP/IP Network Event Multiplexer Event Processor
Accurate sequencing Linearly increasing output Number of local sequencers
Average sequence performance Server has enough threads - Performance OK - Constant QoS Server needs more threads - Performance DROPS - Why? ~ 5000 event/thread „Graceful degradation” Linear drop in performance
Resource usage on the global sequencer at most 2 ms in each second ~ 0
SFS 3 MDS OST MDS OST Event Reporter Committer Client Committer Client Event Processor Event Processor Committer Client Event Processor B A 4 4 3 3 A A B B How-to commit the changes SFS 1 SFS 2 MDS OST Event Multiplexer Event Multiplexer Event Reporter How-to execute „3” if „4” already happened? Unfortunately no real goodsolution
Event sequence error resolution • Ostrich politic • Drop all evens with conflicting sequence • Conflict detection • Is the event applicable? • In design stage … • Replaying the already committed events • Currently lack of Lustre support
Questions? Thank you for your Attention!