680 likes | 887 Views
Optimizing SSIS Performance. Aaron Jackson. Primary Sponsor. Varigence software provides clients with efficient methods for developing, managing, and using Microsoft business intelligence solutions.
E N D
Optimizing SSIS Performance Aaron Jackson
Primary Sponsor Varigence software provides clients with efficient methods for developing, managing, and using Microsoft business intelligence solutions. The software ecosystem rests on the foundation of Biml, or our home-grown Business Intelligence Markup Language. www.varigence.com.au • Business Intelligence Markup Language used to accelerate business intelligence development • IDE for accelerated business intelligence solutions • Excel addin that will make anyone using PowerPivot or cubes in Excel more productive
Sponsor(s) We just know Microsoft BI …and we know it well! www.agilebi.com.au Hilmax Solutions provides clients with easy to use business intelligence toolsets and solutions to extract critical and actionable insights from their business data. Our consultants are experts in business and technology solutions with focus on delivering business value. www.hilmax.com.au
Sponsor(s) SQL Tools is your source for Idera SQL Server tools sales & support in the APAC Region. Idera has SQL Server tools for: • BI Monitoring • Performance & Availability • Backup & Maintenance • Security, Auditing & Governance • Administration • Development Plus, Free Tools covering most functions. Email idera@sqltools.com.au for Special Trial Offers.
Aaron Jackson Aaron Jackson is an ETL Consultant and blogger. Aaron has been working on Data Warehouse solutions in the Finance industry since 2008, and has acquired significant domain knowledge. Aaron’s primary talents are: • Database Development • SSIS Development • PowerShell Development • Performance tuning
Abstract With data volumes increasing in both your source data and data warehouse, and shrinking ETL windows it is important to ensure that your SSIS packages and SSIS environment are performing optimally. In this session we will have a look at common SSIS anti-patterns, how to optimise SSIS packages, how to troubleshoot / optimise SSIS performance before finishing with SSIS best practices. With plenty of practical advice make sure you attend this session.
Agenda • Common SSIS Anti-Patterns • Diagnosing Performance • Performance Tuning • SSIS Best Practices
Common SSIS Anti-Patterns • Blocking Operations • Not using T-SQL • Unnecessary Data • Using compression and indexes • Not following the KISS principle
Blocking Operations • Must process all input rows before producing output • All downstream components must wait for the blocking operation (slows everything down) • Blocking components use a new memory buffer (uses additional memory) • Blocking components are also known as asynchronous components
Blocking Operations Why is this an anti-pattern? • Inefficient use of time and resources • Entire data set must be loaded before downstream processing can continue • Can cause paging to disk • Goes against the core design principles of SSIS • Design SSIS packages to keep the data “moving” • Design SSIS packages to keep the data in memory
Blocking Operations • The most common blocking operation is the Sort operator • Also common is the Aggregate operator
Blocking Operations • Remove sort operations by pre-sorting data • Have extract ordered by business keys • Use order by on SQL source destination • Remove aggregates by using SQL where possible • COUNT, SUM, MAX, GROUP BY, AVG, STDEV
Not using T-SQL • T-SQL is a powerful language for set based operations • T-SQL is much more efficient for computing aggregates and performing updates • Recent updates to the database engine make T-SQL even faster • Column store indexes • In-memory tables • Improved cardinality estimator
Not using T-SQL • SSIS is not set based, it is row based • SSIS is better for complex logical decisions and error paths among the dataflow (think data cleansing and business defaulting rules) • T-SQL will fail upon most errors (think SELECT 1/0 ) • Aggregates and data manipulation are typically faster using T-SQL
Not using T-SQL Why is this an anti-pattern? • SQL Server is a highly efficient system, that should be used where appropriate • Adding T-SQL to your arsenal can speed up your designs “If the only tool you have is a hammer, every problem looks like a nail.”
Unnecessary Data • Unused columns from the source destination • Using wider data types than required • Verbose logging settings (SSIS log) • Not taking advantage of bulk-logged T-SQL statements
Unnecessary Data Why is this an anti-pattern? • Unnecessary data leads to inefficient designs • Packages are doing more work than is required • Wasteful use of resources
Unnecessary Data • Use truncate instead of delete statements • Take advantage of bulk insertwhere possible • Configure SSIS application logging to report OnError and OnWarningevents only (In production)
Unnecessary Data • Remove unused columns • Reduce memory needed per row • Increase the amount of rows thatcan be processed
Unnecessary Data • Sharpen data types
Compression and Indexes • Table Compression was introduced in SQL Server 2008 • Row level or Page level can be used • Compression can reduce storage requirements • Compression can speed up read queries • Indexes can be either clustered indexes or non-clustered indexes
Compression and Indexes Why is this an anti-pattern? • Compression slows down inserts • 35% penalty for row compression • 208% penalty for page compression • Indexes also add overhead and slow down inserts
Compression and Indexes • Remove compression from target destination tables • Remove clustered indexes from destination tables • Use partitioning with switch to manage performance instead
Not following the KISS principle • KISS stands for Keep It Simple, Stupid. • Don’t over complicate designs • SSIS does not forgive poor design • Avoid monolithic package design
Not following the KISS principle Why is this an anti-pattern? • Complicated designs are hard to support, troubleshoot and debug • Complicated designs are hard to understand • Essentially this wastes people’s time • More than likely a sub-optimal design (slow performance)
Not following the KISS principle • Break your design into logical units of work • Use sub-packages with a distinct purpose • Use sequence containers within packages for logical units of work • Follow naming conventions • Use package configurations
Diagnosing Performance • Server O/S Performance • SSIS Performance
Server Performance • Things to pay attention to in Perfmon • CPU • Memory • Disk • Network
Server Performance • CPU • Process / % Processor Time (Total) for dtexec • dtexec should be close to 100% CPU load • Not reaching close to 100% could mean application contention (sql server) • Hardware contention (Disk I/O or Memory sub optimal) • Design limitation. Not using parallelism or too many single-threaded tasks
Server Performance • Memory • Process / Private Bytes for dtexec – The amount of memory used by dtexec • Process / Working Set for dtexec – The amount of memory allocated by dtexec • Memory / Page Reads / sec – Total memory pressure on the system. Suggested by Microsoft that a value of over 500 indicates memory pressure
Server Performance • Disk I/O • Large subject area (Victor recently did a talk as this relates to SQL Server) • Create a performance baseline when designing your system for comparison • Average Disk/sec Read – Ideally you want to see 30ms or less for OLAP • Average Disk/sec Write - Ideally you want to see 30ms or less for OLAP
Server Performance • Network • Network Interface / Current Bandwidth – estimate bandwidth • Network Interface / Bytes Total / sec – the rate bytes are TX and RX over each NIC • Network Interface / Transfers/sec – to measure IOPS • SSIS will move data over your network as fast as your network can handle it
SSIS Performance • Things to pay attention to in Perfmon • Buffer memory – Total memory (virtual and physical) used by dataflow engine • Buffers in use – Number of buffers used by dataflow engine • Buffers spooled – The number of buffers written to disk. When this number is rising, you arepaging memory to disk
Performance Tuning • Remove Blocking Operations • Sharpen Data Types • T-SQL / Set-based Operations • Parallelism • Data Flow Tuning • Network • Lookups
Remove Blocking Operations • Asynchronous components • Inefficient memory use • Slows execution • Prevents downstream components from running • Can cause paging to disk
Sharpen Data Types • Make data types as narrow as possible • Use less memory and increase throughput • Note – Excessive casting will degrade performance • Be aware of precision issues when using money, decimal and float types
T-SQL / Set-based Operations • T-SQL is faster for • Aggregates • Grouping • Updates • Pivot • Unpivot
T-SQL / Set-based Operations • T-SQL can clean your data on the fly • SELECT LTRIM(RTRIM(FIRST_NAME)) AS [FIRST_NAME] • You can use T-SQL for type conversions, coalescing and data type sharpening • However you lack the error handling power that SSIS provides
Parallelism • SSIS allows you to have concurrent streams of work • If two steams of work have no dependencies, why not do it in parallel? • The arrows point to tasksthat will execute concurrently
Parallelism • In your data flow task, allocating enough to run concurrently • This example ismonolithic and not modular(Don’t do this)
Data Flow Tuning • Three main levers to pull • DefaultBufferMaxRows • DefaultBufferSize • EngineThreads
Data Flow Tuning • DefaultBufferMaxRows • This is the maximum amount of rows that can be loaded at any one time. • This setting will only be reached if the size in memory of DefaultBufferMaxRows is less than or equal to DefaultBufferSize
Data Flow Tuning • DefaultBufferSize • This is the size of the buffer in bytes • The maximum value that can be set here is 100MB • Each EngineThread has its own buffer. Something to keep in mind when you are planning your resource capacity.
Data Flow Tuning • EngineThreads • This setting dictates how many threads the package is able to use in parallel • The usefulness of this setting really depends on your package design • A bigger number is not always better
Network • If you have a dedicated SQL Server and a separate application server for SSIS • Tune the connection manager packet size • Default value is 4K • Max value is 32K • Reduce network overhead
Lookups • Three modes of operation • Full Cache – Uses most memory, but is fastest • Partial Cache – Loads on the fly, can be expensive to performance • No Cache – Uses no memory, takes longer • Alternative – Use second source with merge join on lookup key
SSIS Best Practices • Parallel is fastest • Naming Conventions • Use annotations • Modular Designs • Use configurations • Separation Of Concerns • Emerging Technologies
Parallel is fastest • Take advantage of the SSIS Architecture • It was designed to run tasks in parallel • MaxConcurrentExecutables must be set to -1 to allow parallel execution of tasks
Parallel is fastest Example • Parent-Child design pattern • 23 sub packages • Parallel execution • SSIS will only execute what it can handle at any one time
Naming Conventions • Control flow tasks