340 likes | 363 Views
Cluster File System. George Hoenig VERITAS Product Management. Presentation Overview. Cluster File System-Shared File System for SANs CFS Benefits CFS Architecture CFS and Applications. Forward-looking Architecture that fully leverages SAN investment. Corporate Data. Shared Devices. FS.
E N D
Cluster File System George Hoenig VERITAS Product Management
Presentation Overview • Cluster File System-Shared File System for SANs • CFS Benefits • CFS Architecture • CFS and Applications
Forward-looking Architecture that fully leverages SAN investment CorporateData Shared Devices FS FS VERITAS Cluster File System Single-Host Devices Shared Data
Cluster File System • Shared File System • A Cluster File System allows multiple hosts to mount and perform file operations on the same file system concurrently • File operations include: Create, Open, Close, Rename, Delete…... Read file X Write file X
Traditional Shared File Systems • NFS/CIFS-Access to file system and data controlled by a single server • Clients send file operations to server for execution • High availability is not an inherent feature of NFS/CIFS Server LAN Client • Data path is across LAN even though other servers have direct access to disk SAN
Cluster File System • Cluster File System leverages investment in SAN • Provides all nodes with direct access to data • Fault tolerance/high availability built into CFS LAN • Data path is across SAN • Eliminate IO bottlenecks • Eliminates single point of failure • Mainframe shared storage model for open systems SAN
Cluster File System Overview • CFS can provide many features - all with the goal of enhancing application: • Availability- applications are always usable • Performance- applications can manage increasing transaction loads • Manageability - easier to manage growth and change • Scalability - applications can perform more work
CFS Benefits • Easier to Manage • Put files on any disk, and have them accessible by any server • Deploy very large RAID sets efficiently - because all servers can properly share the set • No need to worry about data location • Increase total I/O throughput • Easier to Extend Clusters • No data partitioning • True plug-and-play for cluster nodes • Availability
CFS Benefits • All applications can benefit from a CFS shared-disk environment • Unmodified applications will value: • Deployment flexibility • Easy application load balancing • Modified or cluster-ready applications can take advantage of: • Increased cluster-wide throughput
Shared Disk Clusters • Share hardware • All systems can access same disks concurrently • Scalable beyond 2-3 nodes Clustering Implementations • Shared Nothing Clusters • Duplicate hardware & data • Provide application failover • Serial use of shared resource • Practical for 2-3 nodes
CFS Deployment • Cluster File System built upon shared disk cluster model • Cluster Volume Manager • Concurrent access to physical storage from multiple hosts • Cluster File System • Coordination among nodes in cluster to manage read/write coherency Cluster-shareable disk group
Cluster-shareable disk group Cluster Volume Manager • Simultaneous access to volumes from multiple hosts • Common logical device name • Presents same virtualized storage to each host • Managed from any host in the cluster - updates seen by all nodes • Only raw device access supported from CVM • Volumes remain accessible from other hosts after a single host failure • Failover does not require volume migration
Cluster File System Architecture • Shared disk storage • Asymmetric File Manager/Client architecture • Global Lock Manager (GLM) for cache coherency • Redundant heartbeat links based upon VERITAS Cluster Server • Failover of File Manager maintains CFS availability • Built on VERITAS File System • Solaris, HP-UX and NT LAN/WAN Shared Disk
CFS - Major Features • Large files • Quick I/O • Storage Checkpoints/BLIB • Cluster wide freeze/thaw of mounted file systems • Support for both cluster mounts and local mounts • Concurrent I/O from all nodes • Cluster reconfiguration and failover support • Rolling upgrades • VxFS and CFS can co-exist
Metadata Updates Public Network File Manager Client Metadata Updates Shared Disk
Data Flow Public Network File Manager Client Data read/writes Shared Disk
VERITAS Cluster File System Application CFS Administrator File System Administration Cluster File System GLM GAB I/O Coherency Mgt. Shared Device Access Inter-node Messaging
CFS Components • Global Atomic Broadcast Messaging (GAB) • Used by CFS and GLM for cluster wide messaging • Provides cluster membership service • Low latency transport (LLT) • Transport for cluster wide messaging • Implemented using DLPI over ethernet • Supports redundant links
Global Lock Manager • The GLM is: • High performance - requires minimal inter-node messaging • Fully featured - includes capabilities required by the most complex application environments • Straightforward to use
Global Lock Manager Locking Architecture • Locking performed in the client • Locks are distributed among clients • Locking nodes communicate lock status to other nodes in the cluster • Multiple lock modes supported - exclusive or shared • Lock requests may be queued
GLM - Modes • There are 4 lock levels, each allowing a different level of access to the resource: • Null (NL): Grants no access to the resource; a placeholder mode compatible with all other modes • Share (SH): Grants read access; allows other writers • Update (UP): Grants write access; allows other writers • Exclusive (EX): Grants exclusive access; no others accessors; compatible with NL mode
Lock SH Grant SH Lock SH Grant SH Lock Master Dev123:I50 Node A Node B GLM-Lock Processing Start Read Grant:: (A,SH) Request: Finish Read Start Read Grant:: (A,SH), (B,SH) Request: Finish Read
Revoke NL Lock EX Release NL Grant EX Lock EX Revoke NL Grant EX Release NL Lock Master Dev123:I50 Node A Node B GLM-Lock Processing Grant:: (A,SH), (B,SH) Request: (A,EX) Start Write Purge Finish Write Grant:: (A,EX) Request: Grant:: (A,EX) Request: (B,EX) Start Write Flush Finish Write Grant:: (B,EX) Request:
Revoke SH Lock SH Release SH Grant SH Lock Master Dev123:I50 Node A Node B GLM-Lock Processing Request: (A,SH) Grant:: (B,EX) Start Read Hold Finish Read Grant:: (B,SH) (A,SH) Request:
Lock Master Dev123:I50 Grant (SH) GLM-Master Recovery Grant:: (B,SH) (A,SH) Request: 1. Lock Master Fails 2. Elect Lock Master 3. Gather Client States 4. Resume CFS Operations Grant:: (B,SH) (A,SH) Request: Lock Master Device 123:I50 Node A Node B
CFS/CVM - Applications • Applications requiring high availability • Storage concurrently mounted. IP failed over. • Applications partitioned for scale • Web Servers - Read mostly/load balanced • Databases - Mostly use direct I/O or Quick I/O • Parallel applications • Oracle Parallel Server (OPS) Database • Second Host Backup - Dedicated system reads/writes data from/to offline media and off loads the application server
Payroll Sales Accounts Inventory Higher Availability • Oracle Failover • Detect failure • Import disk group • Mount file system • Initiate Oracle • Oracle logs • Failover IP addresses • Faster failover • Eliminate DG import & FS mount • Oracle fast recovery mode Sales Inventory
CFS operation: File system defined globally Shared files located on any available disk No failover after server shutdown Standard shared operations: NFS mounts defined on one system at a time NFS files locked to explicitly defined disks Failover after server loss F D G E Server Server F D NFS D NFS E Failover NFS F G E NFS G Server Server CFS D CFS D CFS E CFS E CFS F CFS F CFS G CFS G Failover File & Print
Standard share nothing operation: Web database is replicated on two disks - one for each web server Double the disk space; regular copy operations required Alternative - NFS server CFS operation: No web database replication required Save disk space; no synchronization problems No NFS server required Web Data Web Data Server Server Web Server Web Server Web Data Server Server Web Server Web Server Web Servers (Read-mostly)
Standard share nothing operation: Database is partitioned on two disks - each server accessing half the data Tedious to manage; partitioning becomes increasingly impractical as cluster grows Difficult to load balance CFS operation: Partitioning separated from file system configuration Cluster can grow to multiple servers easily Simple to manage and load balance N-Z A-M Server Server Database Server A-Z Server Server Database Server Database Server Database Servers (Read/Write) Database Server
Standard operation: Shared array or CVM Run on raw disk Difficult to manage storage - file system based backup/replication don’t work CFS operation: Separate nodes of OPS share same volumes and file system Cluster can grow to multiple servers without repartioning Simple to manage and load balance Requires cluster membership coordination with OPS N-Z A-M Server Server Database Server Database Server A-Z Server Server Database Server Database Server Oracle Parallel Server
live data Storage Checkpoint Incremental Backup Incremental Backup Checkpoint 2 Checkpoint 1 Off-host backup with CFS Hot Backup Application Server Backup Server Application Netbackup Oracle Cluster File System Shared disks
VERITAS Cluster File System • Built on foundation of existing VERITAS technologies • VERITAS File System • VERITAS Cluster Server • Leverages investment in SAN • Concurrent access to shared storage • Higher levels of storage availability, performance and manageability