450 likes | 629 Views
BIN302 SSIS 2008: Tips and Tricks. Steve Swartz Group Program Manager Microsoft Corporation. Overview. Before You Begin Relationship to the Database Modeling Performance Development Smart Package Design Understanding the Data Flow Deployment General Issues Tuning and Tweaking.
E N D
BIN302SSIS 2008: Tips and Tricks Steve Swartz Group Program Manager Microsoft Corporation
Overview • Before You Begin • Relationship to the Database • Modeling Performance • Development • Smart Package Design • Understanding the Data Flow • Deployment • General Issues • Tuning and Tweaking
Relation To The Database • Administration Issues • Where Do I Store Configurations? • Where Do I Deploy Packages? • Where Do I Store Log Information? • Architectural Issues • Is Integration Inside the Data Tier? • Is Integration Separate from the Data Tier?
Integration Inside the Data Tier Database Instance Load Database Upgrade Database Synchronize Databases Export Results
Integration Separate From the Data Tier Database Instance Database Instance DataMart Load / Upgrade / Synchronize / Export Database(s)
Implementing OVAL • Good Operations == Good package design • Volume, Application, Location call for a scientific approach • Measure • Hypothesize • Modify • Re-Measure
Source -> Row Count Run DTEXEC and measure time taken Volume: Measure Speed per Connection Rows / sec = Row Count / TimeData Flow
Application: The Right Tool for the Job Multiple data sources and destination Transformation requirements Workflow management Trickle feed Real-time ETL Single source and destination server BULK INSERT works just fine
Location: Know Your Environment • Four main factors for performance • CPU • Memory • Disk I/O • Network • Design For Your Environment
Think Modular • Some Languages Force Modularity on a Developer • SSIS is not one of those languages • So Modularity is a Discipline • Create a Package For Each Operation • Use Workflow To Run Individual Operations • Test, Do Perf Analysis Per Operation
Think Parallel • You can add parallelism inside and outside of your packages • Perform tasks in parallel when possible • MaxConcurrentExecutables • Default is the number of cores + 2
Load in Parallel • Partition data so it can be split up and loaded by multiple data flows • Load unrelated tables in parallel Customer Part Region Nation Part Supplier Supplier
Think In-Memory • Performance drops drastically anytime you need to land a file on disk • Reasons for unintentionally writing to disk: • Running out of memory • Lookup caches can hog memory • Blocking transforms • Too many active buffers • BLOB columns
How Does the Data Flow Work? • In memory ETL engine • Metadata set at design time • Data is moved through the pipeline in buffers
Transforms and Buffer Copies • Logically works row by row • Row Count, Derived Column • Buffer is reused RowBased (synchronous) • Works with groups of rows • Merge, Merge Join, Union All • Data copied to new buffers Partially Blocking (asynchronous) • Needs all input rows before producing any output rows • Aggregate, Sort, Pivot Blocking (asynchronous)
Chose the Right Transforms • Synchronous and Asynchronous • Beware of blocking transforms like Sort and Aggregate • Take advantage of your database engine • ORDER BY … • SELECT INTO …
Threading • Pipeline is split into Execution Trees • Execution trees end where a buffer is created • Sources • Asynchronous transforms • Threading was improved in 2008 • 2005 – one thread per execution tree • 2008 – one thread per buffer
The Data Pipeline Destination Source Transform Multicast Destination
Designing Your Data Flow Orders Customers Sort Sort Merge Join Merge Join Products Sort
Why Can’t I Share My Package? • Package protection level • Encrypt {all|sensitive} with User Key • Encrypt {all|sensitive} with Password • Don’t Save Sensitive • Server Storage
Deployment – File System • Deploy files using an MSI, or copied to a share • XML configuration
Deployment – SQL Server • Files are stored in MSDB • Execution typically done with SQL Agent • Make use of Proxy Accounts • Store configurations in SQL Server
Deployment – Package Store • Deploys to the SSIS service • Front end to File and SQL storage
What is the SSIS Service For? • Not required for execution • Tracks running packages • Component cache • Can manage multiple File or SQL locations • Remote connections to the service will require DCOM security changes
Optimize Your Query • SELECT only columns you need • Consider using NOLOCK • Remove locking overhead • Improves the speed of large tables scan • Risky side-effects - can read dirty data
Optimize Lookup Transformation • Optimize your query • Select the right Cache Mode • Use shared cache when it makes sense • Remember Miss Cache for Partial Cache mode • Check log for cache details Lookup processed 150000 rows The processing time was 28.797 seconds The cache used 84000000 bytes of memory
Optimize Data Type Usage • General Tips • Be as narrow as possible: less memory allocation • Avoid excessive casting • Avoid LOB types as they get swapped to disk • Specific Tips • Money is implemented as integer (up to 30% faster than Decimal) • Float faster than Decimal • Throughput varies with destination • Typical values between 3 and 20 MB/sec per connection
Tweak Your Pipeline Settings • Buffer size • Rows per buffer • Engine threads • Blob / Buffer temp path
Optimize Destination • SQL Destination vs. OLEDB Destination • Use OLEDB with Fast Load • Use partitions and partition SWITCH • Dropping non-clustered indexes can improve performance
Call to Action -Join Microsoft and HP NDA’s Discussion Topic- Business Insight Customer Roundtable: An HP/Microsoft Converged Application Technical NDA Discussion Time & Date- Wednesday 7am-8:30am Location-Hotel Concorde. Transport will be provided from the Hotel Concorde to the Messe at 8:30am. Breakfast will be served. Limited space. • How to Register? Register at the HP booth P2 or on http://tinyurl.com/2a9w64m
Session Evaluations Tell us what you think, and you could win! All evaluations submitted are automatically entered into a daily prize draw* Sign-in to the Schedule Builder at http://europe.msteched.com/topic/list/ * Details of prize draw rules can be obtained from the Information Desk.
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.