160 likes | 262 Views
Dealing with Data: Choosing a Good Storage Technology for Your Application Rick Wagner HPC Systems Manager July 1st, 2014. Storage choices should be driven by application need, not just what’s available. Application Focus. But, applications need to adapt as they scale.
E N D
Dealing with Data:Choosing a Good Storage Technologyfor Your ApplicationRick WagnerHPC Systems ManagerJuly 1st, 2014
Storage choices should be driven byapplication need, not just what’s available. Application Focus But, applications need to adaptas they scale. Writing a few small files to an NFS server is fine…writing 1000’s simultaneously willwipe out the server. If you use binary files, don’t invent your own format.Consider HDF5.
Devices Services File Systems Storage Technologies memory Cloud ext4 block MySQL NFS CouchDB Lustre PVFS FUSE
Devices Services File Systems Storage Technologies memory Cloud ext4 block MySQL NFS CouchDB Lustre PVFS Each has its own performance characteristics Not all are available everywhere FUSE
Classic access, POSIX, Windows • Most relevant: • Local • Remote • NFS, CIFS • Parallel (Lustre, GPFS) • Local file systems are good for small and temporary files • Network file systems very convenient for sharing databetween systems File Systems
Parallel File Systems TRESTLES IB cluster GORDON IB cluster TRITON Myrinet cluster 3 Distinct Network Architectures Mellanox 5020 Bridge 12 GB/s 64 Lustre LNET Routers 100 GB/s Myrinet 10G Switch 25 GB/s MDS Redundant Switches for Reliability and Performance Arista 7508 10G Arista 7508 10G MDS MDS 32 OSS (Object Storage Servers) Provide 100GB/s Performance and >4PB Raw Capacity OSS 72TB OSS 72TB OSS 72TB OSS 72TB Metadata Servers
A Cautionary Tale http://www.youtube.com/watch?v=gDfLXAtRJfY&feature=youtu.be
Raw block device (/dev/sdb) or RAM FS (/dev/shm) Devices Useful in specific cases, like fast scratch Can be very good for small I/O
Services Things accessed programmatically Frequents the last thought for HPCapplications: A MISTAKE Databases Cloud storage (Amazon S3) Document storage (MongoDB, CouchDB)
Know What You Need http://www.youtube.com/watch?v=F4OIDszDA9E
My application needs to: Choosing Write a checkpoint dump from memory from a large parallel simulation. I should consider: A parallel file system and a binary file formatlike HDF5.
My application needs to: Choosing Run analysis on remote systems and return the results to a web portal for users. I should consider: Cloud storage for results and input, and local scratch space for the job.
My application needs to: Choosing Randomly access many small files, or read and write small blocks from large files. I should consider: A database, RAM FS, or local scratch space.
Many Boxes Make a Sad Panda http://www.youtube.com/watch?v=N2zK3sAtr-4 Database logos courtesy of RRZEicons http://commons.wikimedia.org/