100 likes | 113 Views
Explore challenges & solutions for managing small files in distributed systems using existing infrastructure without extensive code changes.
E N D
Small File File Systems USC Jim Pepin
Level Setting • Small files are ‘normal’ for lots of people • Metadata substitute (lots of image data are done this way) • Comes from ‘pc’/desktop world • These users have discovered ‘hpc’ but don’t want to change programs (not even MPI) • Find ways to help (best is ‘rewrite’ but that is not reasonable to expect) • Small files are deadly to most file systems • Some more than others • Impact of ‘custer’ systems
Level Setting • Disks • Sata • Not fast. • Reliability issues • Cheap • Fast disk (15k etc) • Not cheap • Fast • People are looking at ‘cheap’ • Drives better backup/maintainability solutions • Distributed doesn’t mean ‘faster’ • Virtualization can be your enemy (in some ways)
Basics • 1800 node cluster • Presents special problems • Myrinet ‘interconnect’ • Ethernet (gb) data plane • Fiber channel disk/tape data plane (2Gb/s) • 256+ disk/tape devices • 15+ file servers • 250+ TB disk • Tape Backup • DR site
Basics • QFS base FS • Archiving and distributed access • Sun thing • Local parallel FS on nodes • NFS • Issues around it • “Condo” disk versus Condo nodes
Basics • Three types of File systems • Parallel FS on compute nodes (temp) • Exception on ‘condo’ nodes • Small files • More directory transactions • Small frames win • No stripes • Large files • More data transactions • Jumbo frames win • Stripes win • Tuning is stripe factors and blocksizes
Small Files • Examples • Genomics Group • 10ks of files in single directory • Natural Language Group • 50-250k files in directory • Many nodes accessing same stuff • Dictionaries • Backups are ‘slower’ / ‘harder’ • Reasons • Updating directory data • Blocking of data on tape
Small Files • Ways to help • “Faster” disk (helps metadata/directory space) • Distributed file access (qfs) • Metadata still a ‘block’. • Read/write locks • Updating for distributed access • Next version scales better (lock improvements) • No free lunch • Special Purpose File Systems and/or local space on cluster nodes (replication)
Next generation • Why change needed • NFS doesn’t cut it • Why • GPFS • Helps some • 10Gb hosts on ‘data plane’ • Next month • Ram disk for ‘metadata’?
Next generation • Storage management solutions • SRB and friends • Database based solutions • Lustre possible • Object storage • Performance for small files/objects is question in my mind • All these have potential but… • Back to don’t change code • “Virtualization” conundrum • How to build massively parallel data spaces • HPCS/other projects