570 likes | 741 Views
Required Slide. Required Slide. SESSION CODE: BIE07-INT. Developing a SQL Server 2008 Fast Track Data Warehouse. Ross LoForte Technology Architect Microsoft Corporation. Eric Kraemer Senior Program Manager Microsoft Corporation. Agenda. SQL Fast Track DW Overview
E N D
Required Slide SESSION CODE: BIE07-INT Developing a SQL Server 2008 Fast Track Data Warehouse Ross LoForte Technology Architect Microsoft Corporation Eric Kraemer Senior Program Manager Microsoft Corporation
Agenda • SQL Fast Track DW Overview • Fast Track DW Implementation Key Principles (Server) • Fast Track DW Implementation Key Principles (Storage Layer) • Fast Track DW Implementation Key Principles (Data Loading) • Q&A
Key to Fast Track Data Warehouse Architecture Data Warehouse appliances for SQL Server
The Alternative: A Balanced System • Balance storage IO capability to the Server/CPU capability to process data • Ensure consistent, predictable IO throughput • Match the storage IO optimization to the RDBMS workload pattern • Scan or Seek? IOPs or MB/s? • Configure the database software to take advantage of the optimized server, network, and storage “Stack”
Fast Track SQL DW Architecture vs. Traditional DW Traditional SQL DW Architecture Shared Infrastructure Fast Track SQL DW Architecture Dedicated DW Infrastructure Architecture modeled after DW Appliances 1TB – 48TB Pre-Tested Dedicated Network Bandwidth Enterprise Shared SAN Storage Shared Network Bandwidth Dedicated Low Cost SAN Arrays 1 for every 4 CPU Cores EMC AX4 – HP MSA2312 SQL 2008 Data Warehouse 4 Processor 16 Core Server • Benefits: • More System Predictability Thus User Experience • Pretested Configurations Lowers TCO • Balanced CPU to I/O Channel Optimized for DW • Modular Building Block Approach • Scale Out or Up within limits of Server and SAN OLTP Applications
What is Fast Track Data Warehouse? • A methodfor designing a cost-effective, balanced system for Data Warehouse workloads • Reference hardware configurationsdeveloped in conjunction with hardware partners using this method • Bestpracticesfor data layout, loading and management Relational Database Only – Not SSAS, IS, RS
Fast Track Component Architecture • Fast Track focuses on balancing the major hardware components • “Major” is defined as the components relevant to overall I/O throughput and data processing capability • “Balance” for Fast Track is defined in relation to • SQL Server data processing capability • Storage system I/O capability • Total hardware component cost
Fast Track Data Warehouse Components • Software: • SQL Server 2008 Enterprise • Windows Server 2008 • Configuration guidelines: • Physical table structures • Indexes • Compression • SQL Server settings • Windows Server settings • Loading • Hardware: • Tight specifications for servers, storage and networking • ‘Per core’ building block
Fast Track Data Warehouse Benefits • Lower TCO • Minimizes risk of overspending on un-balanced hardware configurations • Commodity Hardware • Choice • HW platform • Implementation vendor • Reduced Risk • Validated by Microsoft • Encapsulates best practices • Known performance & scalability Twelve SMP Reference Architectures SI Solution Templates
Sequential I/O Sequential I/O Random I/O OLTP usually random-read centric. Discrete lookups benefit from index optimization and random read capability. Not as predictable & scalable for data warehousing Requires large number of drives to match server I/O consumption capability. • Scans on large data stores are usually read with sequential read patterns and not random read patterns • Scalable, predictable performance • Requires 1/3 or fewer drives to match server I/O consumption capability. All databases contain both scans and seeks among with other types of reads and writes, DW workload indicate that the vast majority of reads are sequential – not all
Fast Track Data Warehouse Components Balanced across all components A A A A B B B B SQL Server 2008 Potential Performance Bottlenecks SERVER CPU CORES WINDOWS SQL SERVER CACHE FC SWITCH FC HBA FC HBA STORAGE CONTROLLER CACHE A A B B A B DISK DISK DISK DISK CPU Feed Rate SQL Server Read Ahead Rate HBA Port Rate Switch Port Rate SP Port Rate LUN Read Rate Disk Feed Rate LUN LUN Current Fast Track Architectures are rated at 200 MB/s per CPU core
System Validation • Validation is intended to confirm the proper installation and configuration of a Fast Track RA • Validation is achieved in two phases • Synthetic IO testing • Validates storage, network, and operating system • SQLIO can be used to generate IO • Perfmon can be used to monitor results • SQL Server testing • Validates performance across SQL Server stack • Final step of deployment process
Core Fast Track Metrics • These metrics are use to both validate and position Fast Track RA’s • Maximum Consumption Rate – Ability of SQL Server to process data for a specific CPU and Server combination and a standard SQL query. • Benchmark Consumption Rate – Ability of SQL Server to process data for a specific CPU and Server combination and a user workload or query. • User Data Capacity – Maximum available SQL Server storage for a specific Fast Track RA assuming 2.5:1 page compression factor and 300 GB 15K SAS. 30% of this storage should be reserved for DBA operations
MCR • Similar in concept to Miles Per Gallon rating for a new car • Not necessarily what you will see when you drive the car, but a good starting point • Provides a standard reference point for • Simple evaluations • Relative comparison between different Fast Track configurations • System validation and benchmarking • Current value for published Fast Track RA’s • 200MB/s per core
Fast Track Benchmark Results • Actual results from Fast Track validation • HP 2 socket, 8 core Configuration Server Fiber Switch Storage Enclosure 300 MB/s 500 MB/s Windows Server OS 300 MB/s MCR 1.6 GB/s 300 MB/s 500 MB/s 300 MB/s Storage Enclosure 300 MB/s 500 MB/s HBA Min 2 GB/s Min 2 GB/s 300 MB/s 300 MB/s 500 MB/s HBA 300 MB/s
BCR • Similar to actual MPG you get with your current driving habits • Provides a workload specific reference point • Defines the ideal outcome of the Full Evaluation scenario • Can be compared to MCR to choose an appropriate FT configuration • Provides a framework for validating Fast Track data warehouse configurations.
Fast Track Benchmark Results • Actual results from Fast Track validation • HP 2 socket, 8 core Configuration Server Fiber Switch Storage Enclosure 150 MB/s 300 MB/s SQL Server OS 150 MB/s BCR 1.2 GB/s 150 MB/s 300 MB/s 150 MB/s Storage Enclosure 150 MB/s 300 MB/s 150 MB/s HBA 1.2 GB/s 1.2 GB/s 150 MB/s 300 MB/s HBA 150 MB/s
UDC • UDC is customer supplied and is the data capacity required • Plan for projected growth • Based on customer projections • Needs to be allocated up-front • Allocate for data management needs • Staging database requirements • Temporary objects • Allocate for TempDB • Typically 20-30% of primary data space • Tempdb is not compressed
Storage – Disk Configuration • Fast Track is very disk efficient • FT uses RAID-1 to enable sequential I/O • 2 disk RAID-1 array per CPU core • Depending on which FT system is selected, system will have at least 16 RAID-1 arrays • Creates virtual affinity between a RAID-1 array and a CPU core • Data is evenly split across RAID-1 arrays using partitioning • Enables ability to load data sequentially • Sequential I/O uses about 1/3 of the number of disks versus random I/O to get same level of performance
Storage – Disk Configuration • Creating RAID Groups • HP MSA, EMC AX, IBM DS • 11 disks per enclosure • 10 dedicated to user data • 1 hot spare • 1 Storage Enclosure per (4) physical Cores • 2 socket quad core server • 2 Storage Enclosures – 22 total disks • Raid Configuration • Primary data: (4) 2 disk RAID-1 arrays • Log: (1) 2 disk RAID-1 array
SQL Server 2008 Minimum Server Configuration SMP Core-Balanced Architecture using Dual Read on HP MSA 2312 RAID GP04 RAID GP03 RAID GP05 RAID GP01 RAID GP02 Using 300GB 15k FC drives each LUN rated at 125MB/s each SP controls 4 LUN’s at 500MB/s or 1000MB/s per MSA DAE Each SP rated at 500MB/s or 1000MB/s for both SP’s Each HBA port rated at 4Gb/s or 400MB/s and 1600MB/s for all 4 HBA ports. 01 03 04 02 05 06 07 08 09 10 LUN6 LUN4 LUN5 LUN8 LUN0 (Logs) LUN3 LUN7 SP A SWITCH 200MB/s per Core* 200MB/s per Core* 200MB/s per Core* 200MB/s per Core* HBA FC 2 4Gb/s or 400MB/s x 2 HBA FC 1 4Gb/s or 400MB/s x 2 HS LUN1 LUN2 SP B Quad Core CPU * Compressed Data DAE = Disk Array Enclosure HBA = Host Bus Adapter SP = Storage Processor FC = Fibre Channel Ports = 4Gbs FC • Per MSA2312 Drive Details • Each MSA can hold 12 drives, this configuration requires 11 • MSA is 2U in total (capacitor eliminates need for battery) • Each MSA SP port controls 4 LUNs, SP-A also controls LOG LUN • Each pair of LUNs consists of (2) 300GB 15k FC drives RAID1 Each SP port rated at 4Gb/s or 400MB/s and 1600MB/s for all 4 SP ports.
SQL Server File Layout Permanent FG Permanent_1.ndf LUN 1 LUN 2 LUN 3 LUN16 Permanant_DB Permanent_16.ndf Permanent_3.ndf Permanent_2.ndf Stage FG Stage Database Stage_1.ndf Stage_2.ndf Stage_3.ndf Stage_16.ndf Local Drive 1 TempDB TempDB.mdf (25GB) TempDB_02.ndf (25GB) TempDB_03ndf (25GB) TempDB_16.ndf (25GB) Log LUN 2 Log LUN 1 Permanent DB Log Permanent DB Log Stage DB Log Stage DB Log
SQL Server Files • User Databases • Create at least one Filegroup containing one data file per LUN • FT targets 1:1 LUN to CPU core affinity • Make all files the same size • Effectively stripes database files across data LUNs • Multiple file groups may be advantageous • Disable Auto-Grow for the database • Transaction Log is allocated to a Log LUN.
SQL Server Files • Transaction Log • Create a single transaction log file per database and place on a dedicated Log LUN. • Enable auto-grow for log files • The transaction log size for each database should be at least twice the size of the largest operation
SQL Server Files • Tempdb • Create one Tempdb data file per LUN • Make all files the same size • Follow standard tempdb best practices • Auto-Grow should be enabled for tempdb • Use large growth increment (10% of initial size)
Conventional data loads lead to fragmentation • Bulk Inserts into Clustered Index using a moderate ‘batchsize’ parameter • Each ‘batch’ is sorted independently • Overlapping batches lead to page splits 1:31 1:32 1:33 1:34 1:35 1:31 1:36 1:32 1:37 1:33 1:38 1:34 1:39 1:35 1:40 Key Order of Index
Minimizing File fragmentation • Pre-allocate database files • Size files correctly to prevent growth • Do not shrink files • Do not use NTFS file fragmentation tools: Rebuild table to ensure disk block level optimal organization • Writing data • Concurrent load operations to the same file will induce fragmentation • DML change operations (Update/Delete) may induce fragmentation • Consider Filegroups and Partitioning to manage concurrent writes for large tables
Writing Sequential Data • Sequential scan performance starts with database creation and extent allocation • Recall that the –E startup option is used • Allocate up to 64 extents contiguously (4MB) • Pre-allocation of user databases is strongly recommended • Autogrow should be avoided if possible • If used, always use T1117
Loading Data • Goal: Minimize logical fragmentation, maximize read performance • Minimizes Disk head movement • Maintains high average request size (Think ~400k not 8k) • Implies lower operation (IOP) counts to achieve high scan rates than traditionally seen with SQL Server. • Sustain high average scan rates (up to 240 MB/s per RAID1 LUN) • Key considerations for a Fast Track data load • Data Architecture: Destination table, partitioning, and filegroup • Source Data: Format & size • System Resources: CPU & Memory
Data Architecture • Starting point for building a Fast Track Load method • Identify target table type • Structure (Heap, Cluster Index) • Define data volatility for Cluster Index targets • Choose table architecture based on data volatility • Partition geometry • Filegroup geometry • SQL Filegroups enable concurrent Load/DML operations with minimal logical fragmentation. Partitioning allows a single table to live in multiple Filegroups.
Source Data and Resources • Consider source data architecture • Type: File or Stream • Transaction: Bulk or Row • Format: Ordered, unordered, multi-file, single-file • Flexibility: Split to multiple files, key order • System resources • CPU Cores & Memory • Review existing FT best practices for loads • msdn.microsoft.com/en-us/library/dd459178.aspx
Building a Fast Track Bulk Load Process • Scenario: Single table migration (320GB) • Destination Table: Page Compressed, Partitioned Cluster Index • Source data considerations • Location: Legacy DB • Frequency: One-time extract • Format: 8 Flat files, Unordered data • System Information: 8 CPU Cores, 192GB memory • SQL file structure • 8 years data, 8 total partitions: 40GB per partition • 4 Filegroups, 2 partitions per filegroup
Example: Fast Track Migration Load to Partitioned CI 8 Concurrent Inserts Step 3 “Transform” Step 2 “Stage Insert” Step 4 “Final Append” Target Database Step 1 “Base Load” 8 Source Data Files 2 sets, 4 concurrent Create Cluster Index with Compression INTO “Final Destination” 8 Concurrent Partition Switch Filegroup “Stage B” Filegroup “Stage A” Filegroup “A” Partition 1,2 Filegroup “B” Partition 3,4 Filegroup “D” Partition 7,8 Filegroup “C” Partition 5,6 8 Heap Stage Table Constraint on CI Part Key Part Switch Part Switch Part Switch Part Switch 8 Concurrent BCP Loads 2 CI Stage Tables 2 CI Stage Tables 2 CI Stage Tables 2 CI Stage Tables 8 Core Server Partition 1 Destination CI Partition 3 Destination CI Partition 5 Destination CI Partition 7 Destination CI Base Heap StageTable No Compression • Destination Partitioned CI Table Create CI Partition 2 Destination CI Partition 4 Destination CI Partition 6 Destination CI Partition 8 Destination CI
Fast Track Partition CI Load (Migration): Principles • The following determines Filegroup Architecture • Volatility • Partition Size • Available Memory • Total Physical CPU Cores • Maximize efficiency of the initial Bulk Load • Avoid sorts • Avoid page lock contention • Avoid compression prior to creation of index • Maximize the Create Index operation • Stage by partition to keep sorts in memory • Parallelize across filegroups to minimize fragmentation • Compress with the Create Index • Partition switch into destination
Applying Concepts from the Partitioned CI Example • Scenario: 1GB Daily incremental load to Partitioned CI • Source Data • Single text file, unordered • Current data may touch two most recent partitions • Source Table: 48 monthly partitions • System: 8 Core, 192GB RAM • SQL file structure • Filegroup “Historical”: Partitions 1..46 • Filegroup “Current”: Partitions 47,48
Example: Fast Track Incremental Load to Partitioned CI Target Database Step 1 “Base Load” Filegroup “Current” Partition 47,48 Filegroup “Historical” Partition 1-46 Filegroup “Stage A” 1 Source Data File BCP Load • Destination Partitioned CI Table Partition 47 Destination CI 8 Core Server Partition 48 Destination CI Partition 1.. Destination CI
Resources • SQL Server Fast Track DW Home Page http://www.microsoft.com/sqlserver/2008/en/us/fasttrack.aspx • Fast Track DW 2.0 Architecture Whitepaper http://msdn.microsoft.com/en-us/library/dd459178.aspx
Required Slide Resources Learning • Sessions On-Demand & Community • Microsoft Certification & Training Resources www.microsoft.com/teched www.microsoft.com/learning • Resources for IT Professionals • Resources for Developers • http://microsoft.com/technet • http://microsoft.com/msdn
INFRASTRUCTURE PLANNING AND DESIGN (IPD) GUIDEMicrosoft SQL Server 2008 and SQL Server 2008 R2 What are IPD Guides? • Guidance & best practices for infrastructure planning of Microsoft technologies SQL Server Guide Benefits • Helps organizations confidently plan a Microsoft SQL Server 2008 and SQL Server 2008 R2 implementation. • Assists database administrators and technical decision makers identify appropriate server roles • Guides architects and administrators in determining the infrastructure components, server placement, and fault-tolerance configuration “At the end of the day, IT operations is really about running your business as efficiently as you can so you have more dollars left for innovation. IPD guides help us achieve this.” It’s a free download! • Go to www.microsoft.com/ipd • Check out the entire IPD series for streamlined IT infrastructure planning Peter Zerger, Consulting Practice Lead for Management Solutions, AKOS Technology Services
Required Slide Complete an evaluation on CommNet and enter to win!
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Fast Track DW Implementation Key PrinciplesHigh Availability
High Availability Core I/O - 2 USB, 1 serial, 1 video port, 3 RJ-45 PS2 keyboard/mouse support Power Supplies - 3+3 redundant power supplies I/O slots 11 PCIe slots std., Option to upgrade to 2 HTx and 7 PCIe Fans 6 hot plug redundant fans, 3 shown
No single points of failure in Failover Clustering! Make Clustering Simple Easy to create, use, and manage Enabling the IT Generalist Reduce Total Cost of Ownership Making Clusters a smart business choice for the enterprise Support for 16 node clusters ClusteringWith Server 2008