750 likes | 932 Views
DB203 - Windows Server 2012 R2 & SQL Server 2014 Infrastruktur. Michael Frandsen. Principal C onsultant MentalNote michaelf@mentalnote.dk. Agenda. SQL Server storage challenges The SAN legacy Traditional interconnects SMB past New ”old” interconnects File Shares – the new Black
E N D
DB203 - Windows Server 2012 R2 &SQL Server 2014 Infrastruktur Michael Frandsen Principal Consultant MentalNote michaelf@mentalnote.dk
Agenda • SQL Server storage challenges • The SAN legacy • Traditional interconnects • SMB past • New ”old” interconnects • File Shares – the new Black • “DIY” Shared Storage • Microsoft vNext
Bio - Michael Frandsen I have worked in the IT industry for just over 21 years, 17 of these has been spent as a consultant.My typical clients are Fortune 500 Companies, most of them global corporations I have a close relationship with Microsoft R&D in Redmond, with the Windows team for 19 years, ever since the first beta of Windows NT 3.1, SQL server for 18 years, since the first version Microsoft did by themselves, v4.21a – I am in various advisory positions in Redmond and am involved in vNext versions of Windows, Hyper-V, SQL Server and Office/SharePoint. Specialty areas: • Architecture & design • High performance • Storage • Low Latency • Kerberos • Scalability (scale-up & scale-out) • Consolidation (especially SQL Server) • High-Availability • VLDB • Data Warehouse platforms • BI platforms • High Performance Computing (HPC) clusters • Big Data platforms & architecture
SQL Server storage challenges • Capacity • Fast • Shared • Reliable
The SAN legacy • Because it’s expensive … it must be fast • SAN Vendor sales pitch • SAN typical • SAN non-match
The SAN legacy • Shared storage or Direct Attached SAN
The SAN legacy • Widespread misconception
The SAN legacy • Complex stack A A DSM MPIO Algorithm MPIO CACHE SQL SERVER WINDOWS CPU CORES FC SWITCH WWN Zoning B B CACHE SCSI Controller Port Logic FC HBA FC HBA STORAGE CONTROLLER XOR Engine A A B B A B CPU Feed Rate HBA Port Rate Switch Port Rate SP Port Rate LUN Read Rate Disk Feed Rate SQL Server Read Ahead Rate DISK DISK DISK DISK LUN LUN
SAN Bottleneck Typical SAN load: Low to medium I/O processor load (top - slim rectangles) Low cache load (Middle - big rectangles) Low disk spindle load (lower half - squares)
SAN Bottleneck Typical Data Warehouse / BI / VLDB SAN load: High I/O processor load – maxed out (top - slim rectangles) High cache load (Middle - big rectangles) Low disk spindle load (lower half - squares)
SAN Bottleneck Ideal Data Warehouse / BI / VLDB SAN load: Low to medium I/O processor load (top - slim rectangles) Low to medium cache load (Middle - big rectangles) High disk spindle load (lower half - squares)
Traditional interconnects • Fibre Channel • Stalled at 8Gb/s for many years • 16Gb/s FC still very exotic • Strong movement towards FCoE (Fibre Channel over Ethernet) • iSCSI • Started in low-end storage arrays • Many still 1Gb/s • 10Gb/E storage arrays typically have few ports compared to FC • NAS • NFS, SMB, etc.
File Share reliability Is this mission critical technology?
SMB 1.0 - 100+ Commands • Protocol negotiation, userauthentication and shareaccess (NEGOTIATE, SESSION_SETUP_ANDX, TRANS2_SESSION_SETUP, LOGOFF_ANDX, PROCESS_EXIT, TREE_CONNECT, TREE_CONNECT_ANDX, TREE_DISCONNECT) • File, directory and volumeaccess (CHECK_DIRECTORY, CLOSE, CLOSE_PRINT_FILE, COPY, CREATE, CREATE_DIRECTORY, CREATE_NEW, CREATE_TEMPORARY, DELETE, DELETE_DIRECTORY, FIND_CLOSE, FIND_CLOSE2, FIND_UNIQUE, FLUSH, GET_PRINT_QUEUE, IOCTL, IOCTL_SECONDARY, LOCK_AND_READ, LOCK_BYTE_RANGE, LOCKING_ANDX, MOVE, NT_CANCEL, NT_CREATE_ANDX, NT_RENAME, NT_TRANSACT, NT_TRANSACT_CREATE, NT_TRANSACT_IOCTL, NT_TRANSACT_NOTIFY_CHANGE, NT_TRANSACT_QUERY_QUOTA, NT_TRANSACT_QUERY_SECURITY_DESC, NT_TRANSACT_RENAME, NT_TRANSACT_SECONDARY, NT_TRANSACT_SET_QUOTA, NT_TRANSACT_SET_SECURITY_DESC, OPEN, OPEN_ANDX, OPEN_PRINT_FILE, QUERY_INFORMATION, QUERY_INFORMATION_DISK, QUERY_INFORMATION2, READ, READ_ANDX, READ_BULK, READ_MPX, READ_RAW, RENAME, SEARCH, SEEK, SET_INFORMATION, SET_INFORMATION2, TRANS2_CREATE_DIRECTORY, TRANS2_FIND_FIRST2, TRANS2_FIND_NEXT2, TRANS2_FIND_NOTIFY_FIRST, TRANS2_FIND_NOTIFY_NEXT, TRANS2_FSCTL , TRANS2_GET_DFS_REFERRAL, TRANS2_IOCTL2, TRANS2_OPEN2, TRANS2_QUERY_FILE_INFORMATION, TRANS2_QUERY_FS_INFORMATION, TRANS2_QUERY_PATH_INFORMATION, TRANS2_QUERY_PATH_INFORMATION, TRANS2_REPORT_DFS_INCONSISTENCY, TRANS2_SET_FILE_INFORMATION, TRANS2_SET_FS_INFORMATION, TRANS2_SET_PATH_INFORMATION, TRANSACTION, TRANSACTION_SECONDARY, TRANSACTION2, TRANSACTION2_SECONDARY, UNLOCK_BYTE_RANGE, WRITE, WRITE_AND_CLOSE, WRITE_AND_UNLOCK, WRITE_ANDX, WRITE_BULK, WRITE_BULK_DATA, WRITE_COMPLETE, WRITE_MPX, WRITE_MPX_SECONDARY, WRITE_PRINT_FILE, WRITE_RAW) • Other (ECHO, TRANS_CALL_NMPIPE, TRANS_MAILSLOT_WRITE, TRANS_PEEK_NMPIPE, TRANS_QUERY_NMPIPE_INFO, TRANS_QUERY_NMPIPE_STATE, TRANS_RAW_READ_NMPIPE, TRANS_RAW_WRITE_NMPIPE, TRANS_READ_NMPIPE, TRANS_SET_NMPIPE_STATE, TRANS_TRANSACT_NMPIPE, TRANS_WAIT_NMPIPE, TRANS_WRITE_NMPIPE) 14 distinct WRITE operations ?!??
SMB 2.0 - 19 Commands • Protocol negotiation, user authentication and share access(NEGOTIATE, SESSION_SETUP, LOGOFF, TREE_CONNECT, TREE_DISCONNECT) • File, directory and volume access(CANCEL, CHANGE_NOTIFY, CLOSE, CREATE, FLUSH, IOCTL, LOCK, QUERY_DIRECTORY, QUERY_INFO, READ, SET_INFO, WRITE) • Other(ECHO, OPLOCK_BREAK) • TCP is a required transport • SMB2 no longer supports NetBIOS over IPX, NetBIOS over UDP or NetBEUI
SMB 2.1 • Performance improvement • Up to 1MB MTU to better utilize 10Gb/E • ! Disabled by default ! • Real benefit required app support • Ex. Robocopy in W7 / 2K8R2 is multi-threaded • Defaults to 8 threads, range 1-128
SQL Server SMB support • < 2008 • Using UNC path could be enabled with trace flag • Not officially supported scenario • No support for system databases • No support for failover clustering • 2008 R2 • UNC path fully supported by default • No support for system databases • No support for failover clustering
Two things happened SQL Server 2012 Windows Server 2012
SQL Server 2012 • UNC support expanded • System Databases supported on SMB • Failover Clustering supports SMB as shared storage • … and TempDB can now reside on NON-shared storage • Mark Souza commented: Great Suggestion!
Windows Server 2012 • InfiniBand • NIC Teaming • SMB 3.0 • RDMA • Multichannel • SMB Direct
New “old” interconnects InfiniBand characteristics • Been around since 2001 • Used mainly for HPC clusters and Super Computing • High throughput • RDMA capable • Low latency • Quality of service • Failover • Scalable
InfiniBand throughput Network Bottleneck Alleviation: InfiniBand (“Infinite Bandwidth”) and High-speed Ethernet (10/40/100 GE) • Bit serial differential signaling • Independent pairs of wires to transmit independent data (called a lane) • Scalable to any number of lanes • Easy to increase clock speed of lanes(since each lane consists only of a pair of wires) • Theoretically, no perceived limit on the bandwidth
InfiniBand throughput Network Speed Acceleration with IB and HSE
InfiniBand throughput Most commercialimplementationsuse 4x lanes 56Gb/s - 64/66 bit encoding • 6,8GB/s pr port SDR - Single Data RateDDR - Double Data RateQDR - Quad Data RateFDR - Fourteen Data RateEDR - Enhanced Data RateHDR - High Data RateNDR - Next Data Rate
InfiniBand throughput Trends in I/O Interfaces with Servers PCIe Gen2 4x: 2GB/s Data Rate • 1,5GB/s Effective Rate PCIeGen2 8x: 4GB/s Data Rate • 3GB/s EffectiveRate (I/O links have their own headers and other overheads!)
InfiniBand throughput Low-level Uni-directional Bandwidth Measurements InfiniBanduses RDMA (Remote Direct Memory Access) HSE can support RoCE (RDMA over Converged Ethernet) RoCEmakes a hugeimpact on small I/O
InfiniBandlatency Ethernet Hardware Acceleration • Interrupt Coalescing • Improves throughput, but degrades latency • Jumbo Frames • No latency impact; Incompatible with existing switches • Hardware Checksum Engines • Checksum performed in hardware -> significantly faster • Shown to have minimal benefit independently • Segmentation Offload Engines (a.k.a. Virtual MTU) • Host processor “thinks” that the adapter supports large Jumbo frames, but the adapter splits it into regular sized (1500-byte) frames • Supported by most HSE products because of its backward compatibility -> considered “regular” Ethernet
InfiniBandlatency IB Hardware Acceleration • Some IB models have multiple hardware accelerators • E.g., Mellanox IB adapters • Protocol Offload Engines • Completely implement ISO/OSI layers 2-4 (link layer, network layer and transport layer) in hardware • Additional hardware supported features also present • RDMA, Multicast, QoS, Fault Tolerance, and many more
InfiniBandlatency HSE vs IB • Fastest 10Gb/E NIC’s 1-5 µs • Fastest 10Gb/E switch 2,3 µs • QDR IB 100 nano sec => 0,1µs • FDR IB 160 nano sec => 0,16 µs - slight increase due to 64/66 encoding • Fastest HSE RoCE end to end 3+ µs • Fastest IB RDMA end to end <1 µs
InfiniBandlatency Links & Repeaters • Traditional adapters built for copper cabling • Restricted by cable length (signal integrity) • For example, QDR copper cables are restricted to 7m • Optical cables with Copper-to-opticalconversion hubs • Up to 100m length • 550 picoseconds copper-to-optical conversion latency • That’s 0,00055 µs or 0,00000055 ms
File Shares – the new Black Why file shares? • Massively increased stability • Cleaned up protocol • Transparent Failover between cluster nodes • with no service outage! • Massively increased functionality • Multichannel • RDMA and SMB Direct • Massively decreased complexity • No more MPIO, DSM, Zoning, HBA tuning, Fabric zoning etc.
New protocol - SMB 3.0 • Which SMB protocol version is used
Transparent Failover SQL Server or Hyper-V Server • Failover transparent to server apps • Zero downtime • Small IO delay during failover • Supports • Planned moves • Load balancing • OS restart • Unplanned failures • Client redirection (Scale-Out only) • Supports both file and directory operations • Requires: • Windows Server 2012 Failover Clusters • Both server running application and file server cluster must be Windows Server 2012 Connections and handles auto-recovered; application IO continues with no errors Normal operation 1 3 2 Failover to Node B \\fs1\share \\fs1\share File Server Cluster File Server Node A File Server Node B
SMB Multichannel Single 10GbE RSS-capable NIC Multiple 1GbE NICs Multiple 10GbE in a NIC team Multiple RDMA NICs Full Throughput • Bandwidth aggregation with multiple NICs • Multiple CPUs cores engaged when using Receive Side Scaling (RSS) Automatic Failover • SMB Multichannel implements end-to-end failure detection • Leverages NIC teaming if present, but does not require it Automatic Configuration • SMB detects and uses multiple network paths SMB Client SMB Client SMB Client SMB Client NIC Teaming RSS NIC 10GbE/IB NIC 10GbE/IB NIC 10GbE NIC 10GbE NIC 10GbE NIC 1GbE NIC 1GbE Switch 10GbE/IB Switch 10GbE/IB Switch 10GbE Switch 10GbE Switch 1GbE Switch 1GbE Switch 10GbE SMB Server SMB Server SMB Server SMB Server NIC 10GbE/IB NIC 10GbE NIC 10GbE/IB NIC 10GbE NIC 10GbE NIC 1GbE NIC 1GbE RSS NIC Teaming Vertical lines are logical channels, not cables
SMB Multichannel 1 session, without Multichannel • No failover • Can’t use full 10Gbps • Only one TCP/IP connection • Only one CPU core engaged CPU utilization per core SMB Client RSS NIC 10GbE Switch 10GbE SMB Server NIC 10GbE RSS Core 1 Core 2 Core 3 Core 4
SMB Multichannel 1 session, with Multichannel • No failover • Full 10Gbps available • Multiple TCP/IP connections • Receive Side Scaling (RSS) helps distribute load across CPU cores SMB Client CPU utilization per core RSS NIC 10GbE Switch 10GbE SMB Server NIC 10GbE RSS Core 1 Core 2 Core 3 Core 4
SMB Multichannel 1 session, without Multichannel • No automatic failover • Can’t use full bandwidth • Only one NIC engaged • Only one CPU core engaged SMB Client 1 SMB Client 2 RSS RSS NIC 10GbE NIC 10GbE NIC 10GbE NIC 10GbE Switch 10GbE Switch 10GbE Switch 10GbE Switch 10GbE SMB Server 1 SMB Server 2 NIC 10GbE NIC 10GbE NIC 10GbE NIC 10GbE RSS RSS
SMB Multichannel 1 session, with Multichannel • Automatic NIC failover • Combined NIC bandwidth available • Multiple NICs engaged • Multiple CPU cores engaged SMB Client 1 SMB Client 2 RSS RSS NIC 10GbE NIC 10GbE NIC 10GbE NIC 10GbE Switch 10GbE Switch 10GbE Switch 10GbE Switch 10GbE SMB Server 1 SMB Server 2 NIC 10GbE NIC 10GbE NIC 10GbE NIC 10GbE RSS RSS
SMB Multichannel Performance • Pre-RTM results using four 10GbE NICs simultaneously • Linear bandwidth scaling • 1 NIC – 1150 MB/sec • 2 NICs – 2330 MB/sec • 3 NICs – 3320 MB/sec • 4 NICs – 4300 MB/sec • Leverages NIC support for RSS (Receive Side Scaling) • Bandwidth for small IOs is bottlenecked on CPU
RDMA in SMB 3.0 SMB over TCP and RDMA 4 • Application (Hyper-V, SQL Server) does not need to change. • SMB client makes the decision to use SMB Direct at run time • NDKPI provides a much thinner layer than TCP/IPNo longer flow anything via regular TCP/IP • Remote Direct Memory Access performed by the network interfaces. File Server Client Memory Memory Application 1 RDMA User Kernel Unchanged API SMB Client SMB Server 2 TCP/ IP TCP/ IP SMB Direct SMB Direct NDKPI NDKPI 3 RDMA NIC RDMA NIC NIC NIC Ethernet and/or InfiniBand
SMB Direct and SMB Multichannel 1 session, without Multichannel • No automatic failover • Can’t use full bandwidth • Only one NIC engaged • RDMA capability not used SMB Client 1 SMB Client 2 R-NIC 10GbE R-NIC 10GbE R-NIC 54GbIB R-NIC 54GbIB Switch 10GbE Switch 10GbE Switch 54GbIB Switch 54GbIB SMB Server 1 SMB Server 2 R-NIC 10GbE R-NIC 10GbE R-NIC 54GbIB R-NIC 54GbIB
SMB Direct and SMB Multichannel 1 session, with Multichannel • Automatic NIC failover • Combined NIC bandwidth available • Multiple NICs engaged • Multiple RDMA connections SMB Client 1 SMB Client 2 R-NIC 10GbE R-NIC 10GbE R-NIC 54GbIB R-NIC 54GbIB Switch 10GbE Switch 10GbE Switch 54GbIB Switch 54GbIB SMB Server 1 SMB Server 2 R-NIC 10GbE R-NIC 10GbE R-NIC 54GbIB R-NIC 54GbIB
“DIY” Shared Storage New paradigm for SQL Server storage design • Direct Attached Storage (DAS) • Now with flexibility • Converting DAS to shared storage • Fast RAID controllers will be shared storage • NAND Flash PCIe cards (ex. Fuson-io) will be shared storage
New Paradigm designs SQL Server SQL Server SQL Server File Server Fusion IO Fusion IO Fusion IO PCIe Flash Disks
New Paradigm designs SQL Server SQL Server SQL Server File Server File Server NAND Flash Shared Storage Traditional SAN Shared Storage
Demo Storage Spaces
SQL Server storage challenges • Capacity • Fast • Shared • Reliable
SQL Server virtualization challenges • Servers with lots of I/O • Servers using all RAM and CPU resources • Servers using more than 4 cores • Servers using large amounts of RAM