200 likes | 208 Views
Learn the essential patterns and best practices in SSIS to optimize your ETL workflows, keep your DBA satisfied, and enhance overall efficiency. This guide provides insights into transformation types, as well as tips on handling data manipulation and processing speed techniques. Discover the nuances of synchronous and asynchronous transformation types, different modes, and key components such as Lookup, Script Task, and more. Gain valuable knowledge on handling large datasets, optimizing performance, and essential SSIS functionalities to improve your ETL processes and maintain best practices.
E N D
Patterns and Best Practices in SSIS or, how to keep your DBA happy with your crazy-ass ETL
It’s all about me... I’m a SpeakingMentor !
Mantra • C# is *not* the only way to initialise a variable • C# is *not* the only way to move files • C# is *not* the only way to call a web service • C# scripts are opaque to the SSIS runtime • C#... <sigh>
Not a pipeline • Data does not get passed to components (cough) • Components manipulate blocks of data (true)
What you *think* happens…. Get data Replace nulls Conditional Split Merge Sort
What actually happens… Get data Conditional Split Replace nulls Sort Merge
It’s all about speed… • There are 2 transformation types: • Synchronous – fastest (streaming and row-based) • Asynchronous – slower • And three ‘modes’: • Non-blocking • Semi-blocking • Full blocking
Non-blocking synchronous streaming transforms • Audit • Cache Transform • Character Map • Conditional Split • Copy Column • Data Conversion • Derived Column • Multicast • Percent Sampling • Row Count • Lookup
Non-blocking synchronous row transforms • DQS cleansing • Export Column • Import Column • OLEDB Command • Script Task • SCD • Lookup
Wait, there’s two ‘Lookup’s ? • Lookups are non-blocking streaming transforms when the ‘Full Cache’ option is used for the lookup data • Using ‘Partial Cache’ or ‘No Cache’ options in the lookup make the Lookup a row-based transform, which is necessarily slower
Semi-blocking asynchronous transforms • Data Mining • Merge • Merge Join • Pivot • Unpivot • Union All • Term Lookup
Full blocking asynchronous transforms • Aggregate • Fuzzy Grouping • Fuzzy Lookup • Row Sampling • Sort • Term Extraction • Script Task
Wait, there’s two ‘Script Task’s ?? • Script tasks are non-blocking when they’re using an outside resource (i.e. not the data that you’re working on) • They become blocking when they collect a dataset before sending it on to a destination • Set the ‘SynchronousInputID’ property on the output columns to ‘None’
Large Data Sets • BufferTempStoragePath • BlobTempStoragePath • These can either be set in your package template, or injected into the .dtsx
Really ? You can just inject stuff ? • Yes • Find and Replace default entries with your custom requirement… any decent text editor will do. BLOBTempStoragePath="F:\astDisk\WithLoadsOfSpace_temp" bufferTempStoragePath="F:\astDisk\WithLoadsOfSpace_temp”
Things that make you go hmmm • Logging • Package Checkpoints • Location, location, location • Expressions • Parallel Operations • ‘IsSorted=True’ property • Auditing • Bad Data
Time to dissect Here’s one I prepared earlier…
Take-aways – You should know.. • Your data • Your environment • Transform types • Ingest / Egress of data • What you are trying to achieve • When to give up and use C#