280 likes | 515 Views
SQL Server Integration Services Deep Dive. Warren Stevens-Baytopp BI Architect - GijimaAst Warren.stevens-baytopp@gijima.com. Integration Services Why ETL Matters. Merge data from heterogeneous data stores: Text files Mainframes Spreadsheets Multiple RDBMS
E N D
SQL Server Integration Services Deep Dive Warren Stevens-Baytopp BI Architect - GijimaAst Warren.stevens-baytopp@gijima.com
Integration Services Why ETL Matters • Merge data from heterogeneous data stores: • Text files Mainframes • Spreadsheets Multiple RDBMS • Refresh data in data warehouses and data marts • Cleanse data before loading to remove errors • High-speed load of data into online transaction processing (OLTP) and online analytical processing (OLAP) databases • Send status notifications on success/failure • Build BI into a data transformation process without the need for redundant staging environments • Automate data-administrative functions
SSIS Overview • Data sources can be diverse, including custom or scripted adapters • Transformation components shape and modify data in many ways. • Data is routed by rules or error conditions for cleansing and conforming. • Flows can be as complex as your business rules, but highly concurrent. • And finally data can be loaded in parallel to many varied destinations.
So Where to Now… • Data Sources • Excel Common problem - not all data coming through correctly • By Default Excel will determine the column types based on a “Majority Type” rule. • Overcome this by forcing a mixed type in the Data connector
Data Sources Continued • Data Sources • Verifying Connectivity / Availability ETL Tasks run through some of the steps and then fail on connectivity issues. • Why Would you want to check for this? • Use scripting task.
Data Sources Continued • Data Sources • OLE DB Provider Selecting Table or View dropdown as a source. • So what is the problem with this? • Replace with what • Select * from [TABLENAME] – not much better or is it? • Select [field list] from [TABLENAME] – resource usage
OLE DB Provider Demo • If a table is selected • SSIS issues an OPENROWSET • If a SQL statement is used • SSIS issues sp_executesql.
Sourcing Data • Common Requirement • Get all Data from one table that does not exist in another Get all rows from a staging table where the business key is not in the dimension table • Conventional T-SQL
Sourcing Data Conventional T-SQL INSERT INTO DIM_DATE SELECT s.* FROM STG_DATE s LEFT OUTER JOIN DIM_DATE d ON s.DateID = d.DateID WHERE d.DateID IS NULL INSERT INTO DIM_DATE SELECT s.* FROM STG_DATE s WHERE DateID NOT IN (SELECT DISTINCT DateID FROM DIM_DATE d)
Sourcing Data • Common Requirement • Get all Data from one table that does not exist in another Get all rows from a staging table where the business key is not in the dimension table • Conventional T-SQL • Using SSIS
Sourcing Data Using SSIS • Merge Join • Same as first T-SQL Statement • Requires a Sort and Conditional Split • Lookup • Using the SSIS functionality. • Less Coding • Uses the error output as the valid records. • Speed Comparisons
Lookups • Exact Matching Want data that matches a specific field. • Normal usage of Lookup • Range Comparisons Want data that falls between 2 values • The Caching SQL Statement • Mapping of Parameters
Date and Time Handling • Date formatting • Construct a date string in the format “YYYYMMDD HHMISS” • Get the month name • Get the formatted month and year in the form - mmm (yyyy) • Create a file using a date in the form yyyy-mm-dd • Create a file using a date in the form yyyy-mm-dd for yesterdays date • A simple yyyymmdd formatted – two ways of doing this
Date Formatting YYYYMMDD HHMISS (DT_STR,4,1252)DATEPART("yyyy",OldDate) + RIGHT("0" + (DT_STR,2,1252)DATEPART("mm",OldDate),2) + RIGHT("0" + (DT_STR,2,1252)DATEPART("dd",OldDate),2) + " " + RIGHT("0" + (DT_STR,2,1252)DATEPART("hh",OldDate),2) + RIGHT("0" + (DT_STR,2,1252)DATEPART("mi",OldDate),2) + RIGHT("0" + (DT_STR,2,1252)DATEPART("ss",OldDate),2)
Date Formatting Month Name (MONTH(NewDate) == 1 ? "January" : MONTH(NewDate) == 2 ? "February" : MONTH(NewDate) == 3 ? "March" : MONTH(NewDate) == 4 ? "April" : MONTH(NewDate) == 5 ? "May" : MONTH(NewDate) == 6 ? "June" : MONTH(NewDate) == 7 ? "July" : MONTH(NewDate) == 8 ? "August" : MONTH(NewDate) == 9 ? "September" : MONTH(NewDate) == 10 ? "October" : MONTH(NewDate) == 11 ? "November" : MONTH(NewDate) == 12 ? "December" : "InvalidMonth")
Date Formatting mmm (yyyy) (MONTH(OldDate) == 1 ? "Jan" : MONTH(OldDate) == 2 ? "Feb" : MONTH(OldDate) == 3 ? "Mar" : MONTH(OldDate) == 4 ? "Apr" : MONTH(OldDate) == 5 ? "May" : MONTH(OldDate) == 6 ? "Jun“ : MONTH(OldDate) == 7 ? "Jul" : MONTH(OldDate) == 8 ? "Aug" : MONTH(OldDate) == 9 ? "Sep" : MONTH(OldDate) == 10 ? "Oct" : MONTH(OldDate) == 11 ? "Nov" : MONTH(OldDate) == 12 ? "Dec" : "ERR") + " (" + DT_WSTR,4)YEAR(OldDate) + ")"
Date Formatting Text file with YYYY-MM-DD "C:\\Temp\\ErrorCodes\\" + (DT_WSTR,4)YEAR(NewDate) + "-" + RIGHT("0" + (DT_WSTR,2)MONTH(NewDate), 2) + "-" + RIGHT("0" + (DT_WSTR,2)DAY( NewDate), 2) + ".txt"
Date Formatting Same but for Yesterday • "C:\\Temp\\ErrorCodes\\" + • (DT_WSTR,4)YEAR(DATEADD("dd", -1, OldDate)) + "-" + • RIGHT("0" + (DT_WSTR,2)MONTH(DATEADD("dd", -1, OldDate)), 2) + "-” • + RIGHT("0" + (DT_WSTR,2)DAY(DATEADD("dd", -1, OldDate)), 2) + ".txt“ • dateadd function: • In each portion of the check
Date Formatting A Simple YYYYMMDD (DT_WSTR,4)YEAR(OldDate) + RIGHT("0" + (DT_WSTR,2)MONTH(OldDate), 2) + RIGHT("0" + (DT_WSTR,2)DAY(OldDate), 2) OR (DT_WSTR,8) ((YEAR(OldDate) * 10000) + (MONTH(OldDate) * 100) + DAY(OldDate))
Some Performance Tuning Tips • Only Select the Columns that you need • Use a SQL Server Destination instead of an OLE DB Destination • If using an OLE DB Destination – use the table or view with fast load data access mode. • Use standardized naming conventions • Where possible, filter your data in the Source Adapter rather than using a Conditional Split transform component • LOOKUP components will generally work quicker than MERGE JOIN components where the 2 can be used for the same task • Use caching in your LOOKUP components where possible. It makes them quicker. Just watch that you are not grabbing too many resources. • Use Sequence containers to organize package structure into logical units of work.
For More Information… • Visit TechNet at www.microsoft.com/technet/prodtechnol/sql/2005/technologies/ssisvcs.mspx • Learn more about SSIS on MSDN athttp://msdn2.microsoft.com/en-us/sql/aa336312.aspx • Great information available at www.sqlis.com • http://download.microsoft.com for Whitepapers and downloads of custom transformations • Jamie Thomson’s Blog http://blogs.conchango.com/jamiethomson/default.aspx • Donald Farmer’s Blog http://sqljunkies.com/WebLog/donald_farmer/default.aspx
Thank you to our Partners for their support of TechDays 2007