120 likes | 339 Views
Distributed Storage And WAN Transport. Peter Kunszt SyBIT Tech Day Nov. 23 2011, Bern. Distributed Storage Systems. Distributed FS Make it look like local FS User sees one space Remote user sees same local space Policies on sharing, access should be available Caching FS
E N D
Distributed Storage And WAN Transport Peter Kunszt SyBIT Tech Day Nov. 23 2011, Bern
Distributed Storage Systems • Distributed FS • Make it look like local FS • User sees one space • Remote user sees same local space • Policies on sharing, access should be available • Caching FS • Data lives somewhere else • But looks local due to smart WAN cache
Gluster (bought by RedHat) • www.gluster.orgGlusterFS. Many commercial users. • The software is open source, they sell an appliance and support (just like redhat) • Single global namespace • Block storage clustering, no central metadata • Works over 1GbE, 10GbE, Infiniband • Replication • ‘NFS–like’ native • No kernel dependenices, simple installation
XtreemFS • Part of XtreemOS project (EU FP7). Used only by German MosGrid in latest version in production. • Object-based design. Global FS namespace. • Metadata and Replica Service stores info. Data on Object Storage Servers. Linked through Replica Management Service. • Written in java – using native Memblocking. Keystore DB used : BabuDB • Uses Linux FUSE kernel module, MIT Vivaldi algorithm for replica automation and selection
DDN WOS • www.ddn.com/industry/life-sciences • Storage appliance, sold with several interfaces including S3 and REST. GPFS based. Highly resilient to failure. • Policy-based replication • Data protection mechanism – several copies stored • Break data into fragments, store those x times • Can be combined with replication
IBM Panache aka Active Cloud Engine • www.almaden.ibm.com/storagesystems/projects/panache/ • Clustered Filesystem CACHE for parallel I/O • Can cache from multiple nodes • GPFS for local FS, pNFS for remote access also using parallel I/O • No proprietary HW or SW needed for installation • Very resilient to failures, late sync if necessary
IO Nodes SONAS layer SONAS layer IO Nodes Pull on cache miss Push on write NFS over the WAN Cache Cluster Site 2 Cache Cluster Site Home Cluster Site SoNAS System IBM Active Cloud Engine™– WAN Caching capabilitiesStatement of Direction • If data is modified at home • Revalidation done at a configurable timeout • Close to NFS style close-to-open consistency across sites • POSIX strong consistency within cache site • If data is modified at cache • Writes see no WAN latency • are done to the cache (i.e. local GPFS), then asynchronously pushed home • If network is disconnected … • cached data can still be read, and writes to cache are written back after reconnection • Fileset on home cluster is associated with a fileset on one or more cache clusters • If data is in cache … • Cache hit at local disk speeds • Client sees local GPFS performance if file or directory is in cache • If data not in cache … • Data and metadata (files and directories) pulled on-demand at network line speed and written to GPFS • Uses NFS for WAN data transfer
IBM Active Cloud Engine™ • What is IBM Active Cloud Engine? • Policy-driven engine that helps improve storage efficiency by automatically • Distributing files, images, and application updates to multiple locations * • Identifying files for backup or replication to a DR location • Moving desired files to the right tier of storage including tape in a TSM hierarchy • Deleting expired or unwanted files • High-performance: can scan billions of files in minutes • What client value does Active Cloud Engine deliver? • Enables ubiquitous access to files from across the globe * • Reduces networks costs and helps improve application performance by distributing files closer to users * • Improves data protection by identifying candidates for backup or DR • Lowers storage cost by moving files transparently to the most appropriate tier of storage • Controls storage growth by moving older files to tape and deleting unwanted or expired files • Enhances administrator productivity by automating file management • What capabilities are supported by Active Cloud Engine in SONAS? • Active Cloud Engine on SONAS supports all the functions described above • What capabilities are supported by Active Cloud Engine in Storwize V7000 Unified? • Active Cloud Engine on Storwize V7000 Unified supports all the functions described above except distribution to multiple locations * Active Cloud Engine Statement of Direction
Fast Transport • Network bandwidth maximization • Fair share • Congestion control • Scheduling • TCP based: GridFTP and similar • FTP blocksize adjustment • Many parallel threads
Aspera • www.asperasoft.com • Built-in to other appliances, many users • UDP based transport • Swarming – can look like a DoS • Also has an FTP connection for control information • Configurable, has server and client UI for transport control • Congestion control • Fair share control
FileCatalyst • www.filecatalyst.com • Similar to Aspera: UDP based transport
Signiant • www.signiant.com • And one more. Is not cheap but I didn’t find out more.