360 likes | 554 Views
Integration Services in SQL Server 2008. Allan Mitchell SQL Server MVP. Who am I. SQL Server MVP SQL Server Consultant Joint author on Wrox Professional SSIS book Worked with SQL Server since version 6.5 www.SQLDTS.com and www.SQLIS.com. Today’s Schedule. SSIS in SQL Server 2008
E N D
Integration Services in SQL Server 2008 Allan Mitchell SQL Server MVP
Who am I • SQL Server MVP • SQL Server Consultant • Joint author on Wrox Professional SSIS book • Worked with SQL Server since version 6.5 • www.SQLDTS.com and www.SQLIS.com
Today’s Schedule • SSIS in SQL Server 2008 • Behind the scenes • Threading • Pipeline Limiter • Front of shop • Lookup Component. Caching and other good stuff. • Data Profiling • Change Data Capture
Behind the Scenes Threading
Threading • Loading speed will benefit from getting as many threads spinning as possible. • In SSIS 2005 this was not always optimised • Blocking Transforms really wrecked your day. • Quaint workarounds
Threading - Multicast Transform • SSIS 2005 • Incredibly quick to generate n copies of the input • All outputs are Synchronous • All outputs on same execution tree • All outputs on same thread. • Round robin
Threading - Multicast Transform • SSIS 2008 • Incredibly quick to generate n copies of the input • All outputs are still Synchronous • All outputs on same execution Path • All outputs can get their own thread
Execution “Tree” 2005 begin execution tree 0 output "Generated" (8) input "Multicast Input 1" (20) output "Multicast Output 1" (21) input "TrashInput" (25) output "Multicast Output 2" (32) input "TrashInput" (29) output "Multicast Output 3" (44) end execution tree 0
Execution “Tree” 2008 Begin Path 0 output "Generated" (7); component "Konesans Data Generator Source" (1) input "Multicast Input 1" (36); component "Multicast" (35) Begin Subpath 0 output "Multicast Output 1" (37); component "Multicast" (35) input "TrashInput" (41); component "Trash Destination" (39) End Subpath 0 Begin Subpath 1 output "Multicast Output 2" (48); component "Multicast" (35) input "TrashInput" (45); component "Trash Destination 1" (43) End Subpath 1 End Path 0
Demo Multicast Transform in 2005 and 2008
Behind the Scenes Pipeline Limiter
Pipeline What? • That’s right – Limiter • Doesn’t that mean things will go slower in 2008? • No. It happens now in 2005 you just don’t know it • Why would I want to restrict the pipeline? • Data = buffers = memory • Memory is reused when the buffer terminates • Push back from a component = no reuse of memory = run out of memory !!!!!
Pipeline Limiter (Nov CTP) Information: 0x400492E0 at Data Flow Task, SSIS.Pipeline: During last execution the pipeline suspended output "Union All Output 1" (487) of component "Union All" (485) for 10445 milliseconds to limit the number of in-memory buffers.
Pipeline Limiter (Feb CTP/RTM) • No “Information” event is now raised externally • Everything sent to the log messages • User:PipelineComponentTime • Filter out “Message” attribute starting with “The Component%”
Demo Pipeline Limiter
Front of Shop Lookup Component Caching Options
The only way it works in 2005 • Caching options • Full • Before the Data Flow task really gets going (Initialize) the lookup components cache the entirety of their reference datasets • Partial • The lookup component will try for a match in cache first then go to disk. No pre-caching • None • Misnomer really. The last query result is cached but that’s it. Every other query is to disk.
The only way it works in 2005 • Full Cache is fastest but • Might take longest to cache • Could take large memory amount to cache • The reference datasets are not passed around • In a loop on the same task you read the reference dataset n times • Not transferrable across Data Flow tasks
The way it can work in 2008 • The Cache Transform • No transformation happens! • Can be used as a destination • Cache Connection Manager • File. Can be read by the Raw File adapter. • In Memory • Allow the reference dataset to be passed around like luggage
The way it can work in 2008 • Just because you can cache the dataset like this does not mean you have to • Default option is still to use the 2005 way.
Row Redirection • One of my main dislikes about this component in 2005 • If a lookup component gets no match then by default it fails • (Row yielded no match during lookup) • You have to configure the error output to handle this. • Redirect down the error output • Ignore (pass the looked up value as a NULL)
Row redirection • In 2008 we say hello to the “No Match Output”
Demo Caching options and redirection
Points to note • Lookup transform is still case sensitive when done in cache • When done on disk via OLE DB Connection Manager it is governed by their rules. • Watch out for different behaviour – Be careful out there.
Data Profiling Task • Control Flow Task • Profiles data in your database • Quick Setup or verbose • Loads of different metrics you can extract • Uses an ADO.Net Connection Manager • Only SQL2K and above can be profiled
Data Profiling Task • Fantastic for seeing how rubbish the data is • Useful for identifying distribution of values • Candidate keys in tables • Lengths of columns • Can be read externally or through a variable • XML whichever way you look at it
Demo Data Profiling Task
Change Data Capture (CDC) • Billed as an ETL feature • We’ll use it if it’s there • What is it? • How do I use it?
CDC-based ETL Traditional BULK ETL BULK BULK Real-time Updates Change Stream • Moves Entire Data Set • Requires ‘Window of Operation’ (hours to days) • Frequency/Latency – monthly, weekly, sometimes daily • Moves Only Changes to the Data • No ‘Window of Operation’ • Frequency/Latency – multiple times a day, real-time • Periodic or Continuous Change Flow Traditional ETL vs. CDC-ETL extract transform load Operational Data Sources ETL Engine DW
Why CDC is Cool • Increased efficiency of ETL Process • Incremental Extractions • Identify changed rows and columns • Identify operation on data, all changes or net changes • Log Based (Transactional Replication LR) • Lightweight
Why CDC is Cool • Viewable as net changes or complete journal • Key point is about the “Change” what you view about those changes is up to you • Differs from Change Capture in that you see the data • No excuses for not considering incremental extracts now.
Terminology around CDC • Capture Instance – Base Object (Max 2 per object) • Capture Process – Reading of the log and placing the rows into the change tables • Retention Periods
Demo Using Change Data Capture
Some of the things I did not cover • C# is now a scripting option
Contact Details/Resources allan.mitchell@konesans.com www.SQLIS.com / www.SQLDTS.com www.SQLServerFAQ.com