300 likes | 482 Views
Spinnaker Networks, Inc. www.spinnakernet.com 301 Alpha Drive Pittsburgh, PA 15238 (412) 968-SPIN. Storage Admin’s Problem. ”Everything you know is wrong” … at least eventually space requirements change “class of service” changes desired location changes. Solution. System scaling
E N D
SpinnakerNetworks, Inc. www.spinnakernet.com 301 Alpha Drive Pittsburgh, PA 15238 (412) 968-SPIN
Storage Admin’s Problem • ”Everything you know is wrong” • … at least eventually • space requirements change • “class of service” changes • desired location changes
Solution • System scaling • add resources easily • without client-visible changes • Online reconfiguration • no file name or mount changes • no disruption to concurrent accesses • System performance
Spinnaker Design • Cluster servers for scaling • using IP (Gigabit Ethernet) for cluster links • separate physical from virtual resources • directory trees from their disk allocation • IP addresses from their network cards • we can add resources without changing client’s view of system
Within each server storage pools aggregate all storage with single service class e.g. all RAID 1, RAID 5, extra fast storage think “virtual partition” or “logical volume” Spinnaker Design
Spinnaker Architecture • Create virtual file systems (VFSes) • a VFS is a tree with a root dir and subdirs • many VFSes can share a storage pool • VFS allocation changes dynamically with usage • without administrative intervention • can manage limits via quotas • Similar in concept to • AFS volume / DFS fileset
Spinnaker Architecture Storage Pool B A Bach Bobs adam ant Storage Pool Eng Spin Depts Users net disk A B Eng
Spinnaker Architecture • Create global “export” name space • choose a root VFS • mount other VFS, forming a tree • by creating mount point files within VFSes • export tree spans multiple servers in cluster • VFSes can be located anywhere in the cluster • export tree can be accessed from any server • different parts of tree can have different CoS
Global Naming and VFSes Storage Pool A B Spin Depts Users Users adam Bobs Bach ant A Eng Storage Pool Eng ant adam Spin net disk B Depts Users Users net disk A A B B Eng Eng Bobs Bach
Clustered Operation • Each client connects to any server • requests are “switched” over cluster net • from incoming server • to server with desired data • based on • desired data • proximity to data (for mirrored data)
Server/Network Implementation Client Access Client Access Gigabit Ethernet Gigabit Ethernet • Network Process • TCP termination • VLDB lookup • NFS server over SpinFS • Network Process • TCP termination • VLDB lookup • NFS server over SpinFS SpinFS Protocol X • Disk Process • Caching • Locking • Disk Process • Caching • Locking Gigabit EthernetSwitch Fibre Channel Fibre Channel
Security • At enterprise scale, security is critical • don’t have departmental implicit “trust” • Kerberos V5 support • For NFS clients • Groups from NIS • For CIFS using Active Directory
Virtual Servers • A virtual server consists of • a global export name space (VFSes) • a set of IP addresses that can access it • Benefits • additional security fire wall • a user guessing file IDs limited to that VS • rebalance users among NIC cards • move virtual IP addresses around dynamically
Performance – single stream • 94 MB/sec read • single stream read, 9K MTU • 99 MB/sec write • single stream write, 9K MTU • All files much larger than cache • real I/O scheduling was occurring
Benefits • Scale single export tree to high capacity • both in terms of gigabytes • and ops/second • Keep server utilization high • create VFSes wherever space exists • independent of where data located in name space • Use expensive class of storage • only when needed • anywhere in the global name space
Benefits • Use third-party or SAN storage • Spinnaker sells storage • but will support LSI storage, others • Kerberos and virtual servers • independent security mechanisms • cryptographic authentication and • IP address-based security as well
Near Term Roadmap • Free data from its physical constraints • data can move anywhere desired within a cluster • VFS move • move data between servers online • VFS mirroring • Mirror snapshots between servers • High availability configuration • multiple heads supporting shared disks
VFS Movement • VFSes move between servers • balance server cycle or disk space usage • allows servers to be easily decommissioned • Move performed online • NFS and CIFS lock/open state preserved • Clients see no changes at all
VFS Mirror • Multiple identical copies of VFS • version number based • provides efficient update after mirror broken • thousands of snapshots possible • similar to AFS replication or NetApp’s SnapMirror
Failover Pools • Failover based upon storage pools • upon server failure, peer takes over pool • each pool can failover to different server • don’t need 100% extra capacity for failover
Failover Configuration P2 P1 P3 P4 SpinServer SpinServer SpinServer 1 2 3
Additional Benefits • Higher system utilization • by moving data to under-utilized servers • Decommission old systems • by moving storage and IP addresses away • without impacting users • Change storage classes dynamically • move data to cheaper storage pools when possible • Inexpensive redundant systems • don’t need 100% spare capacity
Extended Roadmap • Caching • helps in MAN / WAN environments • provide high read bandwidth to single file • Fibrechannel as access protocol • simple, well-understood client protocol stack • NFS V4
Summary • Spinnaker’s view of NAS storage • network of storage servers • accessible from any point • with data flowing throughout system • with mirrors and caches as desired • optimizing various changing constraints • transparently to users
ThankYou Mike Kazar CTO
Design Rationale • Why integrate move with server? • VFS move must move open/lock state • Move must integrate with snapshot • Final transition requires careful locking at source and destination servers
Design Rationale • Why not stripe VFSes across servers? • Distributed locking is very complex • and very hard to make fast • enterprise loads have poor server locality • as opposed to supercomputer large file patterns • Failure isolation • limit impact of serious crashes • partial restores difficult on stripe loss
Design Rationale • VFSes vs. many small partitions • can overbook disk utilization • if 5% of users need 2X storage in 24 hours • can double everyone’s storage, or • can pool 100 users in an SP with 5% free