160 likes | 173 Views
Discover key tips and patterns for enhancing the performance of your SSIS packages. Learn to optimize lookup operations, choose between insert and update patterns, and improve distinct row processing. Dive into performance testing methods and update strategies to boost efficiency in your data workflows.
E N D
Design Patterns for SSIS Performance Darren Green Konesans Ltd
Performance • Utility packages vs specialised work processes • Less than 10% of packages require detailed performance design or tuning • It will take at least 90% of your time! • Use simple tweaks and hints ORDER Col for ClIdx, MICS • More radical design changes? • Which way is faster? • It depends!
Lookups • Two main uses • Check if exists, no values are returned • Lookup data, returns one or more values to augment existing data • Performance Points • Size of reference data set • Number of rows • Size of row (especially with augmentation usage) • Size of input data set • Number of rows • Cache Type
Insert vs Update Patterns • Insert or Update with Lookup • Use lookup to test if exists, or return values • Standard and familiar pattern • Changing the Lookup cache type can be very helpful • Insert then Update • Use constraint failures to direct rows • Use multiple inserts to refine batch to single row • Good for high Insert to update ratio, more inserts • Use lookups without cache on update rows only, more efficient with large reference data sets • Merge Join and Conditional Split for Insert or Update • Must be sorted inputs
Insert vs Updates Performance • Test conditions • Single Virtual Machine • 3GB Ram • Virtual Disk • Packages run in pseudo random order • 18 or 23 iterations per package • Input source of 2 million (narrow) rows • Existing rows • 100K, 500K, 1M
Performance Numbers Performance metrics are for the purpose of illustration only. The organism must be present in every case of the disease. The organism can be isolated from the diseased host and grown in pure culture. The specific disease is reproduced when a pure culture of the organism is inoculated into a healthy host. The organism must be recoverable from the experimentally infected host
Performance Test Results • Uses only top 50% of results for each run • Variation means results quite unpredictable, even excluding outliers • Scale of test too small for real conclusions • External factors a big influence!
Update Methods • OLE-DB Command • Standard and familiar pattern in SSIS • Poor user experience developing SQL and mapping parameters • Poor performance • T-SQL Update Statement • Requires additional staging table and space • Good performance
Update Performance Results • 6 runs for each case in pseudo random order • 2 million rows, 500K updates, 4:1
Distinct Rows • Distinct 2 Million to 201 Rows • One integer column • Sort Transformation • Remove Duplicate Values • Aggregate Transformation • Group By
Distinct Performance Results • 6 runs for each case in pseudo random order • Variation very different • Is the sort transform more susceptible to outside influences?
Thank you • Questions? • Darren Green • darren@konesans.com • www.konesans.com • www.sqlis.com • www.sqldts.com • Feedback Online • http://www.sqlbits.com/sessionfeedback