1 / 61

Optimizing SSIS Performance

Optimizing SSIS Performance. Aaron Jackson. Primary Sponsor. Varigence software provides clients with efficient methods for developing, managing, and using Microsoft business intelligence solutions.

Download Presentation

Optimizing SSIS Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimizing SSIS Performance Aaron Jackson

  2. Primary Sponsor Varigence software provides clients with efficient methods for developing, managing, and using Microsoft business intelligence solutions. The software ecosystem rests on the foundation of Biml, or our home-grown Business Intelligence Markup Language. www.varigence.com.au • Business Intelligence Markup Language used to accelerate business intelligence development • IDE for accelerated business intelligence solutions • Excel addin that will make anyone using PowerPivot or cubes in Excel more productive

  3. Sponsor(s) We just know Microsoft BI …and we know it well! www.agilebi.com.au Hilmax Solutions provides clients with easy to use business intelligence toolsets and solutions to extract critical and actionable insights from their business data. Our consultants are experts in business and technology solutions with focus on delivering business value. www.hilmax.com.au

  4. Sponsor(s) SQL Tools is your source for Idera SQL Server tools sales & support in the APAC Region. Idera has SQL Server tools for: • BI Monitoring • Performance & Availability • Backup & Maintenance • Security, Auditing & Governance • Administration • Development Plus, Free Tools covering most functions. Email idera@sqltools.com.au for Special Trial Offers.

  5. Aaron Jackson Aaron Jackson is an ETL Consultant and blogger. Aaron has been working on Data Warehouse solutions in the Finance industry since 2008, and has acquired significant domain knowledge. Aaron’s primary talents are: • Database Development • SSIS Development • PowerShell Development • Performance tuning

  6. Abstract With data volumes increasing in both your source data and data warehouse, and shrinking ETL windows it is important to ensure that your SSIS packages and SSIS environment are performing optimally. In this session we will have a look at common SSIS anti-patterns, how to optimise SSIS packages, how to troubleshoot / optimise SSIS performance before finishing with SSIS best practices. With plenty of practical advice make sure you attend this session.

  7. Agenda • Common SSIS Anti-Patterns • Diagnosing Performance • Performance Tuning • SSIS Best Practices

  8. Common SSIS Anti-Patterns • Blocking Operations • Not using T-SQL • Unnecessary Data • Using compression and indexes • Not following the KISS principle

  9. Blocking Operations • Must process all input rows before producing output • All downstream components must wait for the blocking operation (slows everything down) • Blocking components use a new memory buffer (uses additional memory) • Blocking components are also known as asynchronous components

  10. Blocking Operations Why is this an anti-pattern? • Inefficient use of time and resources • Entire data set must be loaded before downstream processing can continue • Can cause paging to disk • Goes against the core design principles of SSIS • Design SSIS packages to keep the data “moving” • Design SSIS packages to keep the data in memory

  11. Blocking Operations • The most common blocking operation is the Sort operator • Also common is the Aggregate operator

  12. Blocking Operations • Remove sort operations by pre-sorting data • Have extract ordered by business keys • Use order by on SQL source destination • Remove aggregates by using SQL where possible • COUNT, SUM, MAX, GROUP BY, AVG, STDEV

  13. Not using T-SQL • T-SQL is a powerful language for set based operations • T-SQL is much more efficient for computing aggregates and performing updates • Recent updates to the database engine make T-SQL even faster • Column store indexes • In-memory tables • Improved cardinality estimator

  14. Not using T-SQL • SSIS is not set based, it is row based • SSIS is better for complex logical decisions and error paths among the dataflow (think data cleansing and business defaulting rules) • T-SQL will fail upon most errors (think SELECT 1/0 ) • Aggregates and data manipulation are typically faster using T-SQL

  15. Not using T-SQL Why is this an anti-pattern? • SQL Server is a highly efficient system, that should be used where appropriate • Adding T-SQL to your arsenal can speed up your designs “If the only tool you have is a hammer, every problem looks like a nail.”

  16. Unnecessary Data • Unused columns from the source destination • Using wider data types than required • Verbose logging settings (SSIS log) • Not taking advantage of bulk-logged T-SQL statements

  17. Unnecessary Data Why is this an anti-pattern? • Unnecessary data leads to inefficient designs • Packages are doing more work than is required • Wasteful use of resources

  18. Unnecessary Data • Use truncate instead of delete statements • Take advantage of bulk insertwhere possible • Configure SSIS application logging to report OnError and OnWarningevents only (In production)

  19. Unnecessary Data • Remove unused columns • Reduce memory needed per row • Increase the amount of rows thatcan be processed

  20. Unnecessary Data • Sharpen data types

  21. Compression and Indexes • Table Compression was introduced in SQL Server 2008 • Row level or Page level can be used • Compression can reduce storage requirements • Compression can speed up read queries • Indexes can be either clustered indexes or non-clustered indexes

  22. Compression and Indexes Why is this an anti-pattern? • Compression slows down inserts • 35% penalty for row compression • 208% penalty for page compression • Indexes also add overhead and slow down inserts

  23. Compression and Indexes • Remove compression from target destination tables • Remove clustered indexes from destination tables • Use partitioning with switch to manage performance instead

  24. Not following the KISS principle • KISS stands for Keep It Simple, Stupid. • Don’t over complicate designs • SSIS does not forgive poor design • Avoid monolithic package design

  25. Not following the KISS principle Why is this an anti-pattern? • Complicated designs are hard to support, troubleshoot and debug • Complicated designs are hard to understand • Essentially this wastes people’s time • More than likely a sub-optimal design (slow performance)

  26. Not following the KISS principle • Break your design into logical units of work • Use sub-packages with a distinct purpose • Use sequence containers within packages for logical units of work • Follow naming conventions • Use package configurations

  27. Diagnosing Performance • Server O/S Performance • SSIS Performance

  28. Server Performance • Things to pay attention to in Perfmon • CPU • Memory • Disk • Network

  29. Server Performance • CPU • Process / % Processor Time (Total) for dtexec • dtexec should be close to 100% CPU load • Not reaching close to 100% could mean application contention (sql server) • Hardware contention (Disk I/O or Memory sub optimal) • Design limitation. Not using parallelism or too many single-threaded tasks

  30. Server Performance • Memory • Process / Private Bytes for dtexec – The amount of memory used by dtexec • Process / Working Set for dtexec – The amount of memory allocated by dtexec • Memory / Page Reads / sec – Total memory pressure on the system. Suggested by Microsoft that a value of over 500 indicates memory pressure

  31. Server Performance • Disk I/O • Large subject area (Victor recently did a talk as this relates to SQL Server) • Create a performance baseline when designing your system for comparison • Average Disk/sec Read – Ideally you want to see 30ms or less for OLAP • Average Disk/sec Write - Ideally you want to see 30ms or less for OLAP

  32. Server Performance • Network • Network Interface / Current Bandwidth – estimate bandwidth • Network Interface / Bytes Total / sec – the rate bytes are TX and RX over each NIC • Network Interface / Transfers/sec – to measure IOPS • SSIS will move data over your network as fast as your network can handle it

  33. SSIS Performance • Things to pay attention to in Perfmon • Buffer memory – Total memory (virtual and physical) used by dataflow engine • Buffers in use – Number of buffers used by dataflow engine • Buffers spooled – The number of buffers written to disk. When this number is rising, you arepaging memory to disk

  34. Performance Tuning • Remove Blocking Operations • Sharpen Data Types • T-SQL / Set-based Operations • Parallelism • Data Flow Tuning • Network • Lookups

  35. Remove Blocking Operations • Asynchronous components • Inefficient memory use • Slows execution • Prevents downstream components from running • Can cause paging to disk

  36. Sharpen Data Types • Make data types as narrow as possible • Use less memory and increase throughput • Note – Excessive casting will degrade performance • Be aware of precision issues when using money, decimal and float types

  37. T-SQL / Set-based Operations • T-SQL is faster for • Aggregates • Grouping • Updates • Pivot • Unpivot

  38. T-SQL / Set-based Operations • T-SQL can clean your data on the fly • SELECT LTRIM(RTRIM(FIRST_NAME)) AS [FIRST_NAME] • You can use T-SQL for type conversions, coalescing and data type sharpening • However you lack the error handling power that SSIS provides

  39. Parallelism • SSIS allows you to have concurrent streams of work • If two steams of work have no dependencies, why not do it in parallel? • The arrows point to tasksthat will execute concurrently

  40. Parallelism • In your data flow task, allocating enough to run concurrently • This example ismonolithic and not modular(Don’t do this)

  41. Data Flow Tuning • Three main levers to pull • DefaultBufferMaxRows • DefaultBufferSize • EngineThreads

  42. Data Flow Tuning • DefaultBufferMaxRows • This is the maximum amount of rows that can be loaded at any one time. • This setting will only be reached if the size in memory of DefaultBufferMaxRows is less than or equal to DefaultBufferSize

  43. Data Flow Tuning • DefaultBufferSize • This is the size of the buffer in bytes • The maximum value that can be set here is 100MB • Each EngineThread has its own buffer. Something to keep in mind when you are planning your resource capacity.

  44. Data Flow Tuning • EngineThreads • This setting dictates how many threads the package is able to use in parallel • The usefulness of this setting really depends on your package design • A bigger number is not always better

  45. Network • If you have a dedicated SQL Server and a separate application server for SSIS • Tune the connection manager packet size • Default value is 4K • Max value is 32K • Reduce network overhead

  46. Lookups • Three modes of operation • Full Cache – Uses most memory, but is fastest • Partial Cache – Loads on the fly, can be expensive to performance • No Cache – Uses no memory, takes longer • Alternative – Use second source with merge join on lookup key

  47. SSIS Best Practices • Parallel is fastest • Naming Conventions • Use annotations • Modular Designs • Use configurations • Separation Of Concerns • Emerging Technologies

  48. Parallel is fastest • Take advantage of the SSIS Architecture • It was designed to run tasks in parallel • MaxConcurrentExecutables must be set to -1 to allow parallel execution of tasks

  49. Parallel is fastest Example • Parent-Child design pattern • 23 sub packages • Parallel execution • SSIS will only execute what it can handle at any one time

  50. Naming Conventions • Control flow tasks

More Related