1 / 51

Continuously Available File Server: Under the Hood

WSV410. Continuously Available File Server: Under the Hood. Claus Joergensen Principal Program Manager Microsoft Corporation. Agenda. Remote File Storage for Server Applications Scale-Out File Server for application data Setup and configuration Clustered Shared Volumes

faxon
Download Presentation

Continuously Available File Server: Under the Hood

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WSV410 Continuously Available File Server: Under the Hood Claus Joergensen Principal Program Manager Microsoft Corporation

  2. Agenda • Remote File Storage for Server Applications • Scale-Out File Server for application data • Setup and configuration • Clustered Shared Volumes • Scale-Out File Server cluster group • Scale-Out File Server scalability • SMB Transparent Failover • This session assumes familiarity with: • Windows Server 2008 R2 Failover Clustering, including Cluster Shared Volumes • Windows Server 2008 R2 File Server

  3. Remote File Storage for Server Applications • New scenario in Windows Server 2012 • Server apps storing data files on file shares • Examples: • Hyper-V VHD, configuration files, snapshots etc. • SQL Server database and log files • IIS content and configuration files • Benefits: • Easy provisioning and management • Share management instead of LUNs and zoning • Flexibility • Dynamically relocate server in datacenter without needing to reconfigure network or storage access • Leverage network investments • Specialized storage networking infrastructure or knowledge is not required • Lower CapEx and OpEx • Example: SQL Server Hyper-V Server IIS App Server Web Server DB Server File Server File Server Shared Storage

  4. Scale-Out File Server for Application Data • New clustered file server • Targeted for server app storage • Key capabilities*: • Dynamic scaling w. active-activefile shares • Fault tolerance with zero downtime • Fast failure recovery • Clustered Shared Volume cache • CHKDSK with zero downtime • Application consistent snapshots • Support for RDMA enabled networks • Simpler management • Requirements • Windows Failover Cluster with Clustered Shared Volumes • Both application server and file server cluster must be running Windows Server 2012 Application Servers Clustered File Server Single Logical File Server (\\fs\share) Single File System Namespace Data Center Network (Ethernet, InfiniBand or combination) Cluster Shared Volumes *) Capabilities highlighted in orange are unique to Scale-Out File Server

  5. title Setup and Configuration

  6. Setup and Configuration • Install the necessary role on all nodes • File Server role • Failover Clustering feature • Create cluster • No special requirements • Add cluster disks to Cluster Shared Volumes • Configure networks for: • Client Access Points (CAP) • Clustered Shared Volumes (CSV) • Create File Server Role • Select “Scale-Out File Server for application data” • Give it a network name • Create file shares

  7. Windows PowerShell Example • #Install Roles and Features • Import-Module ServerManager • Add-WindowsFeature -name File-Services, Failover-Clustering, RSAT-Clustering • #Create Failover Cluster • New-Cluster –Name smbclu –Node FSF-260403-07, FSF-260403-08, FSF-260403-09 • #Add Cluster Disk 1 to Cluster Shared Volumes • Add-ClusterSharedVolume -Name “Cluster Disk 1” • #Configure Cluster Network 1 for Client Access and Cluster Network 2 for CSV (may not be needed) • $(Get-ClusterNetwork -Name "Cluster Network 1").Role=3 • $(Get-ClusterNetwork -Name "Cluster Network 2").Role=1 • #Create Scale-Out File Server • Add-ClusterScaleOutFileServerRole -Name smbsofs • #Create File Share • New-SmbShare -Name vm1 -Path c:\clusterstorage\volume1\vm1 –FullAccess domain\hvhost$

  8. title Cluster Shared Volumes

  9. Cluster Shared Volumes File System • Fundamental to and required for Scale-Out File Servers • Scale-Out file shares require CSVFS paths • Supports VSS for SMB file shares • CSVFS supports most NTFS features and operations • Detailed information available with Windows Server 2012 Release Candidate here • Direct I/O support for file data access • Caching of CSVFS file data (controlled by oplocks) • Redirects I/O for metadata operations to coordinator node • Redirects I/O for data operations when a file is being accessed simultaneously by multiple CSVFS instances • Leverages SMB Direct and SMB Multichannel for internode communication

  10. Cluster Shared Volumes CachingImprove CSV I/O Performance • Windows Cache Manager integration • Buffered read/write I/O is cached the same way as NTFS • Clustered Shared Volumes Block Cache • Read-Only cache for un-buffered I/O • I/O which is excluded from Windows Cache Manager • Distributed cache guaranteed to be consistent across the cluster • Significant value for Pooled VM VDI scenarios • Enabling CSV Block Cache: • SharedVolumeBlockCacheSizeInMB – Cluster common property • 0 = Disabled • Non-zero = the amount of RAM in MB to be used for cache on each cluster node • Recycling of resource is not needed • CsvEnableBlockCache - Physical Disk resource private property • 0 = Disabled (default) • 1 = Enabled for that clustered shared volume • Requires recycling the resource to take effect

  11. CHKDSK with Clustered Shared Volumes • CHKDSK is seamless with CSV • CHKDSK is significantly improved with scanning (online) separated from repair (offline) • With CSV repair is online as well • CHKDSK processing with CSV • Cluster checks (once a minute) to see if CHKDSK (spotfix) is required • Cluster enumerates NTFS $corrupt to identify affected files • Cluster pauses the affected CSV file system (CSVFS) to pend I/O • The underlying NTFS volume is dismounted • CHKDSK (spotfix) is run against affected files for a maximum of 15 seconds to ensure application are not affected • The underlying NTFS volume is mounted and CSV namespace is un-paused • If CHKDSK (spotfix) did not process all records • Cluster will wait 3 minutes before continuing • Enables a large set of affected files to be processed over time • If corruption is too large • CHKDSK (spotfix) is not run and marked to run at next Physical Disk online

  12. title Anatomy of a Scale-Out File Server

  13. Scale Out File Server group • Contains • Distributed Server Name • Scale-Out File Server • Group Type: • ScaleoutFileServer • Resource Types: • Scale Out File Server • Distributed Network Name Get-ClusterGroup | ? {$_.GroupType -eq "ScaleoutFileServer"} | FL Name, OwnerNode, State, GroupType Name : smbsofs33 OwnerNode : FSF-260403-07 State : Online GroupType : ScaleoutFileServer Get-ClusterGroup | ? {$_.GroupType -eq "ScaleoutFileServer"} | Get-ClusterResource Name State OwnerGroupResourceType ---- ----- ---------- ------------ Scale-Out File Server Online smbsofs33 Scale Out File Server smbsofs33 Online smbsofs33 Distributed Network Name

  14. Distributed Network Name (DNN) • Client Access Point (CAP) for a Scale-Out File Server  DNS Name on the network • Security • Creates and manages computer object in AD • Registers credentials with LSA on each node • DNS • Registers the CAP with DNS • Registers node IP address for all nodes • Does not use virtual IP addresses • DNN updates DNS when • DNN resource comes online and every 24 hours • A node is added or removed to/from cluster • A cluster network is added or removed as a client network • IP address changes • If not using dynamic DNS, you must manually add the DNS records with the node IPs for the cluster networks enabled for client access for each node > smbsofs33 Server: stb-red-dc-01.stbtest.microsoft.com Address: 10.200.81.201 Non-authoritative answer: Name: smbsofs33.ntdev.corp.microsoft.com Addresses: 2001:4898:0:fff:0:5efe:10.217.108.49 2001:4898:0:fff:0:5efe:10.217.108.103 2001:4898:0:fff:0:5efe:10.217.108.148 10.217.108.148 10.217.108.49 10.217.108.103 IPs on same subnet. One for each node.

  15. Distributed Network Name (DNN) > smbsofs33 Server: stb-red-dc-01.stbtest.microsoft.com Address: 10.200.81.201 Non-authoritative answer: Name: smbsofs33.ntdev.corp.microsoft.com Addresses: 2001:4898:0:fff:0:5efe:10.217.108.49 2001:4898:0:fff:0:5efe:10.217.108.103 2001:4898:0:fff:0:5efe:10.217.108.148 10.217.108.148 10.217.108.49 10.217.108.103 > smbsofs33 Server: stb-red-dc-01.stbtest.microsoft.com Address: 10.200.81.201 Non-authoritative answer: Name: smbsofs33.ntdev.corp.microsoft.com Addresses: 2001:4898:0:fff:0:5efe:10.217.108.103 2001:4898:0:fff:0:5efe:10.217.108.148 2001:4898:0:fff:0:5efe:10.217.108.49 10.217.108.49 10.217.108.148 10.217.108.103 • DNS will round robin client DNS lookups • DNS sort IPv6 and IPv4 separately • Concatenates with IPv6 at top • SMB client is resilient to unavailable IPs • Attempts to connect to first IP address • After 1 second, client attempts the next 7 IP addresses • If any of the previous attempts fail, client attempts next IP address • Client will continue until it reaches end of list • Client will proceed with the first server to respond • SMB client • Connects to one and only one cluster node for a given scale-out file server • Can connect to different cluster nodes for each scale-out file server

  16. Scale Out File Server (SOFS) • Scale Out File Server resource is responsible for • Online scale-out file shares on each node • Listen to scale-out share creations, deletions and changes • Replicate changes to other nodes • Ensure consistency across all nodes for the Scale-Out File Server • Implemented using cluster clone resources • All nodes run a SOFS clone • The clones are started and stopped by the SOFS leader • The SOFS leader runs on the node where the Scale Out File Server resource is online

  17. Scale-Out File Server group behavior • The group is online on one of the nodes • Moving the group • Moves the responsibility for coordination • Does not affect the availability of the name or shares • Admin can constrain which cluster nodes can be used • Modify “possible owners” list for DNN and SOFS resource • Useful if some nodes must be reserved for other workloads

  18. Client Redirection SMB communication • SMB Clients are distributed at initial connect through DNS Round Robin • SMB Clients are not redistributed automatically • SMB Clients connected to a Scale-Out File Server can be redirected to use a different cluster node Witness communication SQL Server 1 1 3 Scale-Out File Server Cluster Node A W Node B W Node C W Get-SmbWitnessClient| FL ClientName, FileServerNodeName, WitnessNodeName ClientName: SQLServer FileServerNodeName: A WitnessNodeName : B Move-SmbWitnessClient –ClientNameSQLServer–DestinationNodeC

  19. Cluster Network Planning • SMB Client to SMB Server • Use cluster networks enabled for client access • If using multiple network adapters, each must be on separate IP subnets • CSV traffic • Metadata updates • Infrequent for Hyper-V and SQL Server workload • Mirrored Storage Spaces • No storage connectivity • Prefers cluster networks not enabled for client access • Leverages SMB Multichannel and SMB Direct (SMB over RDMA) • Disable iSCSI networks for cluster use, to prevent unpredictable latencies SMB Client To SMB Server Metadata Mirrored Spaces Storage Link Failures Storage IO (FC, iSCSI, SAS)

  20. title Scale-Out File Server Scalability and Performance

  21. Test Bed Topology • SMB clients • 8 computers, each with 2x10Gbps • Scale-Out File Server cluster • 8 nodes, each with 2x10Gbps • SAN Storage • 2x8Gbps FC Fabric to File Server • 4x4Gbps FC Fabric to Storage • RAID 5 LUNS 2x10Gbps

  22. Preliminary results based on Windows Server 2012 Beta Bandwidth Scalability • IOMeter • Parameters • 512KiB IO size • 100% Sequential Read • 1 thread 144 outstanding IOs Bottlenecked on 2x4Gbps FC Local ~2%  Remote

  23. Preliminary results based on Windows Server 2012 RC Hyper-V boot-storm Local vs. Remote CSV Cache Enabled vs. Disabled Uses parent/diff VHDX 8GB CSV block cache From VM state change to user logon complete 320 virtual machines / host 5,120 virtual machines (16host) • Uses parent/diff VHDX • 8GB CSV block cache • From VM state change to user logon complete • 320 virtual machines / host • 2,560 virtual machines (8 host) With CSV cache enabled, 90% booted in <40s

  24. title SMB Transparent Failover

  25. Historical - Windows Server 2008 R2Failovers are not transparent 1 • Targeted for traditional file server use scenarios • Server applications expect storage to be continuously available • In Windows Server 2008 R2 • Connection and file handles are lost on share failover, leading to • Application disruption • Administrator intervention required to recover Normal operation 2 Failover share and connections and handles lost 3 Administrator intervention needed to recover SQL Server 1 3 \\fs1\share \\fs1\share 2 File Server Cluster Node A Node B

  26. Windows Server 2012SMB Transparent Failover 1 • Failover transparent to server application • Zero downtime • Small IO delay during failover • Supports planned and unplanned failovers • Hardware/software maintenance • Hardware/software failures • Load balancing / Client Redirection • Resilient for both file and directory operations • Interoperable with both types of clustered file servers: • Scale-Out File Server • “Classic” File Server • Requires: • Windows Server 2012 Failover Cluster • SMB Client with SMB 3.0 • File shares configured with Continuously Availability property (default) Normal operation Failure occurs - connections and handles lost, temporary stall of IO 2 Connections and handles auto-recovered Application IO continues with no errors 3 SQL Server 1 3 \\fs1\share \\fs1\share 2 File Server Cluster Node A Node B

  27. SMB Transparent FailoverNew components (1/2) SMB Server Witness Service • SMB Client (Redirector) • Client operation replay • End-to-end support for replayable and non-replayable operations • SMB Server • Support for network state persistence • Files are always opened Write-Through User Witness Protocol SMB Client Kernel Witness Client SMB Server User User Kernel Kernel SMB Redirector SMB Server State persistence Operation replay SMB 3.0 Resume Key Filter File System

  28. SMB Transparent FailoverNew components (2/2) SMB Server Witness Service • Resume Key Filter • Resume handle state after planned or unplanned failover • Fence handle state information • Witness Protocol • Enables faster unplanned failover because clients do not wait for timeouts • Enables dynamic reallocation of load with Scale-Out File Servers User Witness Protocol SMB Client Kernel Witness Client SMB Server User User Kernel Kernel SMB Redirector SMB Server State persistence Operation replay SMB 3.0 Resume Key Filter File System

  29. Resume Key FilterOverview • Resume handle state after planned or unplanned failover • Persist state information only for handles with continuous availabilitycontext • Installs with Failover Clustering feature • Sits on file server file system stack • Attaches to all cluster disks

  30. Resume Key FilterFeatures (1/3) • Protection of handle state so the client can reconnect • For example, needed when failure occurs when the client has an exclusive no-share handle • Block new handle creation until the previously known handles are resumed or cancelled (timed out) • Protection from namespace inconsistency • Needed when failure occurs as a file rename is in flight

  31. Resume Key FilterFeatures (2/3) • Enable Create Replay • Needed when failover occurs as a FILE_CREATE is in-flight • RKF records the pre-existence state for the file BEFORE the create is passed down to NTFS • After failover, the client re-issues the create as a Replay • On receipt of the Replay, RKF figures out the correct processing for FILE_CREATE so that the client sees the correct result • Now exists: FILE_CREATE => FILE_OPEN and the return result is FILE_CREATED

  32. Resume Key FilterFeatures (3/3) • Restoration of Delete Pending state • Needed when a file has multiple handles open and has been marked for deletion when failover occurs • RKF holds Delete Pending state above NTFS so that existing handles can be resumed after failover • Handling for the change of the Read Only attribute • Needed when the read only attribute is changed with pre-existing writers • RKF undoes the RO attribute to allow the restoration of the prior granted access • Opaque storage for remote file system specific data • E.g. SRV stores information needed to resume Byte Range Locks

  33. Resume Key FilterVolume instance attach

  34. SMB WitnessOverview • Enables faster recovery from unplanned failures • SMB clients do not need to wait for TCP timeouts • Enables dynamic reallocation of load with Scale-Out File Servers • Administrator can redirect SMB client to a different cluster node • Installs with Failover Clustering feature • Is a Service and runs on all cluster nodes • Not to be confused with Failover Cluster File Share Witness

  35. SMB WitnessRegistration process SMB communication • SMB client connects to \\fs1\share on Node A and notifies the Witness client • Witness client obtains list of cluster members from witness service on node A • Witness client removes the data node (Node A) and selects a witness server (Node B) • Witness client registers with Node B for notification on events for \\fs1 • Witness server on Node B registers with cluster infrastructure for event notification on \\fs1 Witness communication SQL Server 1 2 4 \\fs1\share \\fs1\share File Server Cluster Node A Witness Node B Witness

  36. SMB WitnessNotification process SMB communication • Normal operation • SMB connection with Node A • Witness connection with Node B • Unplanned failure on Node A • Cluster infrastructure notifies Witness server on Node B • Witness server on Node B notifies Witness client that Node A went offline • Witness client notifies SMB client • SMB client drops its connection to Node A and starts reconnecting with another cluster node (Node B) • Witness client attempts to select new Witness server Witness communication SQL Server 1 6 1 4 \\fs1\share \\fs1\share File Server Cluster Node A Witness Node B Witness

  37. Enhanced and New Event Logs • Application and Services – Microsoft – Windows – SMBClient • Application and Services – Microsoft – Windows – SmbServer • Application and Services – Microsoft – Windows – ResumeKeyFilter • Application and Services – Microsoft – Windows – SMBWitnessClient • Application and Services – Microsoft – Windows – SMBWitnessService • Example: SMB Transparent Failover

  38. demo Scale-Out File Server Claus Joergensen Principal Program Manager Windows File Server Team

  39. The TechEd Cluster in a Box Demo Stack • Cluster in a Box prototypes • Quanta • Wistron • LSI HA-DAS MegaRAID®and SAS controllers • Quanta application servers, JBOD expansion, and 10GbE switch • Mellanox IB FDR NICs and switch • OCZ SAS SSDs • Infrastructure • Domain Controller server • Power distribution unit • 1GbE switch • Keyboard & monitor MegaRAID® is a registered trademark of LSI Corporation

  40. Remote File Storage for Server Applications • New scenario in Windows Server 2012 • Server apps storing data files on file shares • Examples: • Hyper-V VHD, configuration files, snapshots etc. • SQL Server database and log files • IIS content and configuration files • Benefits: • Easy provisioning and management • Share management instead of LUNs and zoning • Flexibility • Dynamically relocate server in datacenter without needing to reconfigure network or storage access • Leverage network investments • Specialized storage networking infrastructure or knowledge is not required • Lower CapEx and OpEx • Example: SQL Server Hyper-V Server IIS App Server Web Server DB Server File Server File Server Shared Storage

  41. Related Content • Breakout Sessions • VIR306 Hyper-V over SMB2: Remote File Storage Support in Windows Server 2012 Hyper-V • WSV303 Windows Server 2012 High-Performance, Highly-Available Storage Using SMB • WSV310 Windows Server 2012: Cluster-in-a-Box, RDMA, and More • WSV314 Windows Server 2012 NIC Teaming and Multichannel Solutions • WSV322 Update Management in Windows Server 2012: Revealing Cluster- Aware Updating • WSV330 How to increase SQL availability and performance using Window Server 2012 SMB 3.0 solutions • WSV334 Windows Server 2012 File and Storage Services Management

  42. SIA, WSV, and VIR Track Resources #TE(sessioncode) Talk to our Experts at the TLC Hands-On Labs DOWNLOAD Windows Server 2012 Release Candidate microsoft.com/windowsserver DOWNLOAD Windows Azure Windowsazure.com/ teched

  43. Resources Learning TechNet • Connect. Share. Discuss. • Microsoft Certification & Training Resources http://northamerica.msteched.com www.microsoft.com/learning • Resources for IT Professionals • Resources for Developers • http://microsoft.com/technet http://microsoft.com/msdn

  44. Required Slide Complete an evaluation on CommNet and enter to win!

  45. Please Complete an Evaluation Your feedback is important! Multipleways to Evaluate Sessions Be eligible to win great daily prizes and the grand prize of a $5,000 Travel Voucher! Scan the Tag to evaluate this session now on myTechEdMobile

  46. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

  47. title Appendix

  48. SMB Transparent Failover Semantics (1/2)Server Side: State persistence until client reconnects • Server obeys a contract with the client to ensure replay of operation is transparent to application • All race conditions cleanly addressed • Protocol documentation will fully define behavior SMB 3.0 Server User User Kernel Kernel SMB2 Redirector SMB2 Server Operation replay State persistence SMB 3.0 Resume Key Filter File System

  49. SMB Transparent Failover Semantics (2/2)Client Side: state recovery • Client obeys a contract with the server to ensure replay of operation is transparent to application • All race conditions cleanly addressed • Protocol documentation will fully define behavior SMB 3.0 Server User User Kernel Kernel SMB2 Redirector SMB2 Server Operation replay State persistence SMB 3.0 Resume Key Filter File System

More Related