1 / 42

Best Practices in Fast Track, Parallel Data Warehouse, and Traditional Data Warehouse Design

SESSION CODE: BIE306. Best Practices in Fast Track, Parallel Data Warehouse, and Traditional Data Warehouse Design. Jeff Spiller ESS Performance and Solutions Engineering COE (Center of Excellence) BI ATC (Business Intelligence - Advanced Technology Center).

vail
Download Presentation

Best Practices in Fast Track, Parallel Data Warehouse, and Traditional Data Warehouse Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SESSION CODE: BIE306 Best Practices in Fast Track, Parallel Data Warehouse, and Traditional Data Warehouse Design Jeff Spiller ESS Performance and Solutions Engineering COE (Center of Excellence) BI ATC (Business Intelligence - Advanced Technology Center) Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design BI Best Practices and Tuning for Scaling SQL Server 2008

  2. Objectives for this Session • Understand why Fast Track and the SQL Server 2008 Parallel Data Warehouse (PDW) providefaster I/O bandwidth over traditional data warehouse designs • Understand differences between scale-up and scale-out data warehouse designs • Review reference configurations for traditional database design, Fast Track and PDW appliances • Discuss best practices which will allow customers to more effectively exploit hardware to support end user BI workloads

  3. The Continuum to SQL Server 2008 R2 (scale-up) OLTP systems SSIS Data Warehouse

  4. The Continuum to SQL Server 2008 R2 (scale-up) • Fast Track = scale up OLTP systems SSIS Fast Track

  5. Mid- Tier FastTrack RDBMS The Continuum to SQL Server 2008 R2 (scale-up) Parallel Data Warehouse (PDW) • Fast Track = scale up • PDW = scale out OLTP systems SSIS PDW

  6. Mid- Tier FastTrack RDBMS SSAS The Continuum to SQL Server 2008 R2 (scale-up) Parallel Data Warehouse (PDW) • Fast Track = scale up • PDW = scale out • Hub-and-spoke architecture to include support for SMP spokes OLTP systems SSIS PDW Hub Traditional MD design PDW Mid- Tier RDBMS RDBMS SSAS

  7. Workload Differences Between OLTP and BI Environments

  8. HP Has Two Types of ROLAPReference Configurations Microsoft/HP Fast Track reference configurations OR SQL Server Parallel Data Warehouse (PDW) SQL Server/HP Traditional DW design reference configurations Mmm, what will my logical & physical DB design look like ? Lower hardware costs Different logical and physical DB design philosophies

  9. Traditional ROLAP (data warehouse/data mart) Physical Design • Create file group • 1 LUN may be physically stripped • Create table: • on 1 LUN • partition table across multiple LUNs for additional parallelism • balance workload across filegroups, LUNS - which, in turn, gets stripped across many physical disks • PRO: Data is distributed and I/Os are parallelized across multiple physical disk drives • CON: Little attention is traditionally paid to how or where data is physically loaded or indexes built RAID 5 • It is not uncommon to have hundreds of disk drives to support the I/O throughput requirements in a traditional DW environment

  10. How does Fast Track and PDW get it’s speed ? X-Ray view at the physical disk level First let’s look at a traditional DW….. How To Gain Speed While Reducing CostsLet’s lake a closer look under the hood to understand BI performance

  11. Traditional ROLAP Physical Database Design Loading data into the data warehouse/mart • ETL batch jobs are executed in parallel to complete the batch load window as fast as possible • Data from multiple fact tables data may be loaded simultaneously from to speed up ETL batch load window • Some data warehouses may use a trickle feed to load data, close to real time • Dimension table insert/updates may occur in parallel • Indexes may be updated in real time as rows are inserted or updated - or - • Indexesmay be rebuilt after data is loaded • Traditional ROLAP design tends to encourage physical disk fragmentation Sequential data Fact table 3rd day load Fact table 6th day load Fact table 5th day load Fact table 2nd day load Data is stored wherever it happens to land Fact table Initial load

  12. Traditional ROLAP physical database designTrade-offs using indexes in data warehouses/data marts • ETL batch windows may take a long time because building indexes can be resource intensive • Indexes (duplicate data) consume a large amount of disk space (4x – 10x less space) • DBAs spend a lot of time managing and tuning indexes • Indexes may reduce the number of disk I/Os to service a query. But, at the cost of slower disk service times due to extra disk head movement “directly access data that index points to” • Proof is in monitoring ad-hoc query response times using PDW or Fast Track “index-lite” designs vs. traditional “index-heavy” database designs • Customized indexes and summary tables may speed up specific queries. But, ad-hoc queries may get inconsistent response times because of slower scan rates & long disk service times) Duplicate data Column Index / Column Pre-Calculated data

  13. SummaryWhy Fast Track & PDW provide high levels of I/O throughput… Fast Track & PDW Eliminating indexes and storing data sequentially will provide the fastest disk throughput rates Summary table Fast Track & PDW Index-lite Fastest sequential scan rates Index Index-lite is faster because there is less disk head movement Traditional DW design with indexes & summary tables Disk throughput is slower with indexes, aggregates and summary tables

  14. How Does Fast Track and PDW Reduce Data Warehouse or Data Mart Storage Costs ? • Efficient data loads • Reduce or eliminate indexes Example: Average disk Seek time is typically about 4ms; Full stroke is about 7.5ms. At 15K RPM = 250 revolutions/sec. = 4ms for a full revolution = Average latency is about 2ms. Fast Track & PDW are designed to stream large blocks of data sequentially which is even faster than “average latency” because disk heads are directly over the streaming data.

  15. Why does PDW and Fast Track want data to be stored sequentially ? Why Does PDW and Fast Track Want Data to be Stored Sequentially ? Track of data Latency Virtually No seek time Disk drive manufacturers sometimes quote average seek time and max/full seek time numbers Seek time is typically 2 - 4x longer than average latency. By eliminating seek time you can have approximately 2 – 4x fewer disk drives in order to maintain a given throughput level. Fast Track & PDW are designed to stream large blocks of data sequentially!

  16. What is the Most Efficient Way for Fast Track and PDW to Store Data ? • Outer tracks • Hold more data (Good for fact tables) • Outside tracks can pass more data under the heads in a single rotation (good for fact tables) • Inner tracks better for smaller dimension tables (and they should cached anyway &/or replicated in PDW) • Organizing data on the physical disk drives to physically eliminate seek time can be the difference of requiring 64 disks vs. 200 disk drives to provide the similar levels of I/O throughput

  17. BOTTOM LINE: HP, SQL Server and Fast Track ConceptsProvide You with Speed while Reducing Costs In addition, HP and SQL Server PDW uses Massively Parallel Processing (MPP) to expand Fast Track concepts in a BI “appliance” Fast scan rates Fast Track and PDW get it’s speed from FAST scan rates !

  18. HP Has Two Types of ROLAP – Reference ConfigurationsFast Track or PDW DB Design • Traditional DB design reference configurations • HP Business Intelligence Sizer for Microsoft SQL Server 2005/2008 http://h71019.www7.hp.com/ActiveAnswers/us/en/sizers/microsoft-sql-bi.html • Fast Track http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA3-0347ENW.pdf http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA3-0314ENW.pdf • SQL Server 2008 Parallel Data Warehouse (PDW) – “Appliance” Fast Track or PDW Traditional DB design

  19. HP Fast Track Data Warehousing Continuum • Scales from SMB to Enterprise • Prescriptive guidance and optimized methodology for deploying a data warehouse • Targeted at query workloads patterned for large sequential data reads • Balanced hardware approach • Supports up to 48TB Data Warehouse at leading price/performance metrics • HP provides configurations,tested performance guidanceand best practices for deploying/operating/managing Basic6 – 12TBDL38x w/MSA2000 Mainstream12 – 24TBDL585 G6 w/MSA2000 Mainstream16 – 32 TB DL580 G5 w/MSA2000 G2 Premium24 – 48 TBDL785 G6 w/MSA2000 G2

  20. HP SQL Server 2008 Parallel Data Warehouse (PDW) Data Rack Control Rack

  21. Free Your IT Pressures . . . Get More Value Without HP Factory Express With HP Factory Express Faster time to solution Free up valuable IT resources Maximize your IT investment

  22. ProLiant Servers

  23. Chip Manufacturer Features BreakdownFast Track uses high end x86 servers • Intel Features (DL580) • Scalable to 24 cores (4 processor) • Clock Speeds (up to 2.67 GHz) – on 6-core model • Memory expandable to 256 GB • I/O expansion slots (11) • Up to 16 internal disks • x64 Architecture • 4U • AMD Features (DL785) • Scalable to 48 cores (8 processor) • Clock Speeds (up to 2.8 GHz) – on 6 core model • Memory expandable to 512 GB • I/O expansion slots (11) • Up to 8 internal disks • x64 Architecture • 7U

  24. Miscellaneous Techniques to Improve SQL Server BI Performance

  25. SQL Server 2008 Techniques to Improve Performance • Database Services • Separate the DB/NON DB workloads • SSIS/AS on a separate machine / SQL Server only machine • Compressing Fact Tables and Indexes • PDW automatically uses compression • Fast Track recommends compression • Compression Reduces I/O workload but Increases CPU utilization • Increases Query performance due to more pages being read via each I/O • Compression can be enabled selectively • Compressing Backup

  26. SQL Server 2008 Techniques to Improve Performance • Database Services • Table Partitioning • Align table storage to a business requirement driven boundary • Primary key considerations • PDW uses hash key to balance rows across the servers and storage subsystem • Indexes • Traditional – create indexes based upon query workload • Fast Track & PDW use no indexes or Index-lite

  27. SQL Server 2008Techniques to Improve Performance • Database Services • Star Schema Optimization • Queries that are directed towards star schema are optimized using bitmap filters. This greatly improves query performance by ignoring non-qualifying rows while the fact table is being read

  28. SQL Server 2008Techniques to Improve Performance • Database Services • Bit Map Filter Must Haves • In Your Database Design you MUST: • Fact tables are expected to have at least 100 pages. The optimizer considers smaller tables to be dimension tables • Only inner joins between a fact table and a dimension table are considered • The join predicate between the fact table and dimension table must be a single column join, but does not need to be a primary-key-to-foreign-key relationship. An integer-based column is preferred • Joins with dimensions are only considered when the dimension input cardinalities are smaller than the input cardinality from the fact table • Make sure table “stats” are updated

  29. SQL Server 2008 Techniques to Improve Performance • Database Services • Resource Governor • Only within a SQL Server instance • Controls CPU & memory resources used by SQL Server • Classify incoming connections and route their workloads to a specific group classification • User pool/group • Normal, Advanced and power users • Set workload group priority • Allows for Internal/Default and User created Resource Pools • PDW uses the Resource Governor

  30. SQL Server Analysis Services 2008

  31. SQL Server Analysis Services 2008 Techniques to Improve Performance • SSAS • SSAS has to major components • Formula Engine (does most of the analysis work and tries to keep cells in memory) – Fast clock speeds are best • Storage Engine(if cells are not in memory, the Storage Engine gets the data from disk) – Goal is to minimize Storage Engine use and keep data in memory for the Formula Engine to use • Faster Storage (SSD) OR more disk drives for quicker responses to Storage Engine • Manage your partitions in your AS Database by query performance required • Because Large Cubes > 100 GB may not fit in memory. So we design the partitions to get into memory as quickly as possible. • Best Practice – less than 4 million cells per partition

  32. SQL Server Analysis Services 2008Techniques to Improve Performance • SSAS – Settings to Remember • Memory\TotalMemoryLimit is 80% of physical memory on the server by default • Manually set memory for SSAS cubes • This is especially important if you run SSAS cubes on the same server as a ROLAP data warehouse/mart • If set too high, SSAS will page out of Memory • SSAS page faulting is worse than simply allocating less memory to the cubes and forcing the Storage Engine to retrieve cells fro disk • Things to Monitor with System Monitor • SSAS:Memory\Memory Usage KB • SSAS:Proc Aggregations\Temp • Context Switches/sec Tune memory

  33. SQL Server Analysis Services 2008Techniques to Improve Performance • SSAS – Process Measurement • WSRM • Windows System Resource Manager (WSRM) is a feature of Windows Server 2003, Enterprise and Datacenter editions • Using WSRM, administrators can control how CPU resources are allocated to applications, services, and processes • WSRM is useful in a consolidated environment (SQL Server, SSAS, SSIS, etc.) • Manage CPU and memory utilization • WSRM:Process\Actual Managed CPU% counters to monitor CPU utilization • Do not use WSRM to set SQL Server instance limits. SQL Server has it’s own settings you can tune

  34. SQL Server Analysis Services 2008Techniques to Improve Performance • SSAS – Optimization Strategy • Use FAST I/O via the DataDir property • Move active partitions to fast storage or SSD implementation • Use a Dedicated Temp Storage Directory via Tempdir • Keep drive contention minimized. • SSD is excellent for this purpose • Keep partitions sized reasonably < 4 million cells • Based upon the number of facts within your AS Database • Configure memory based upon your usage using the “Aggregation Design Wizard”

  35. SQL Server Integration Services 2008Techniques to Improve Performance • SSIS – Optimization Strategy • Parallelism • Multiple copies against the same source • Provides more pipeline to move data • Separate process space for each package • Must have a control process to resync the package from async mode • Provides more execution trees for more buffers to be allocated

  36. SQL Server Integration Services 2008Techniques to Improve Performance • SSIS – Optimization Strategy • Blocking “Row-at-a-time inserts” may be better than bulk load • Parallelism and More Pipeline Buffers are allocated via Execution Trees Each of these Numbered Steps represents a new Execution Tree Spawning multiple copies of the package with a horizontal partition of data will create more process space and execution trees

  37. Thank You!

  38. Required Slide Resources Learning • Sessions On-Demand & Community • Microsoft Certification & Training Resources www.microsoft.com/teched www.microsoft.com/learning • Resources for IT Professionals • Resources for Developers http://microsoft.com/technet http://microsoft.com/msdn

  39. Required Slide Complete an evaluation on CommNet and enter to win!

  40. Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st http://northamerica.msteched.com/registration You can also register at the North America 2011 kiosk located at registrationJoin us in Atlanta next year

  41. © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

  42. Required Slide

More Related