770 likes | 1.07k Views
SSIS/ETL Project SQL Server Integration Services (BIDS) 2008. SetFocus Business Intelligence Master’s Program Jeff Jacob, MBA P: (312) 772-6142 E1 : Jeffrey.K.Jacob@gmail.com E2 : Jeffrey.Jacob@SetFocus.com. SSIS/ETL Project: AllWorks , Inc. Documentation. Objectives :
E N D
SSIS/ETL ProjectSQL Server Integration Services (BIDS) 2008 SetFocus Business Intelligence Master’s Program Jeff Jacob, MBA P: (312) 772-6142 E1: Jeffrey.K.Jacob@gmail.com E2: Jeffrey.Jacob@SetFocus.com
SSIS/ETL Project: AllWorks, Inc.Documentation Objectives: Based on source data, design a 3NF database for AllWorks, Inc. Create the database, tables, and constraints in T-SQL Develop a SSIS Project file with ETL packages to bring data from source files into the AllWorksOLTP database; use a master package to run the ETL packages in proper succession and to effectuate database maintenance tasks Deploy the SSIS Project file to SQL Server and schedule the job to run nightly at midnight
2. Design a relational (3NF) database. * note: LaborMaster name chosen because table contains data about both employees and contractors. NOT USED
3. Script the creation of the relational (3NF) database AllWorksOLTP and its tables. USE [master] GO IF EXISTS(SELECT name FROM sys.databasesWHERE name= N'AllWorksOLTP') DROP DATABASE [AllWorksOLTP] GO CREATE DATABASE AllWorksOLTP; USE AllWorksOLTP; GO Create Table dbo.CountyMaster( CountyPKint primary key clustered, CountyDescriptionvarchar(100) not null ); Create Table dbo.ClientMaster( ClientPKint primary key clustered, ClientNamevarchar(100) not null, ProjectManagervarchar(100) not null, StreetAddressvarchar(100) not null, Cityvarchar(100) not null, [State] char(2) not null, ZipCodevarchar(50) not null, CountyPKintnot null, ClientImagevarchar(150), Constraint FK_CustMaster_CountyMaster Foreign Key (CountyPK) References dbo.CountyMaster(CountyPK) ); Create Table dbo.JobMaster( JobPKint primary key clustered , JobDescriptionvarchar(50) , ClientPKintnot null, MaterialMarkupPctdecimal (10,2) , AdditionalOverheadPctdecimal (10,2) , JobClosedFlagbit not null , JobClosedDatedatetime, JobStartedDatedatetime, Constraint FK_JobMast_ClientMast Foreign Key (ClientPK) References dbo.ClientMaster (ClientPK) );
Create Table dbo.LaborMaster( WorkerPKint primary key clustered, FirstNamevarchar(50) not null, LastNamevarchar(50) not null, EmployeeFlagbit not null, LastUpdatedatetime default GetDate(), FullNamevarchar(150) not null, CellPhonevarchar(50) null ); Create Table dbo.EmployeeRates( WorkerRatesPKint identity primary key clustered, WorkerPKintnot null, WorkerHourlyRatemoney not null, EffectiveDatedatetime, Constraint FK_EmpRates_EmpMaster Foreign Key (WorkerPK) References dbo.LaborMaster (WorkerPK) ); Create Table dbo.TimeSheets( TimeSheetsPKint identity primary key clustered, WorkerPKintnot null, WorkDatedatetimenot null, JobPKintnot null, RegHoursWorkeddecimal (10,2), Constraint FK_TimeSheets_JobMaster Foreign Key (JobPK) References dbo.JobMaster (JobPK), Constraint FK_TimeSheets_LaborMast Foreign Key (WorkerPK) References dbo.LaborMaster (WorkerPK) ); Create Table dbo.Invoices( InvoicePKint primary key clustered, InvoiceNumbervarchar(50) not null, ClientPKintnot null, InvoiceTermsvarchar(50) , WorkDatesvarchar(50) , InvoiceDatedatetimenot null, Constraint FK_Invoices_JobMaster Foreign Key (ClientPK) References dbo.ClientMaster (ClientPK) ); Create Table dbo.InvoiceReceipts( InvoiceReceiptsPKint identity primary key clustered, InvoicePKintnot null, JobPKintnot null, AmountPaidmoney, ReceiptDatedatetime, Constraint FK_InvoiceReceipts_Invoices Foreign Key (InvoicePK) References dbo.Invoices (InvoicePK), Constraint FK_InvoiceReceipts_JobMaster Foreign Key (JobPK) References dbo.JobMaster (JobPK) ); Create Table dbo.InvoiceXJobMaster( InvoiceXJobMasterPKint identity primary key clustered, InvoicePKintnot null, JobPKintnot null, InvoiceAmountmoney , AdditionalLabormoney , Constraint FK_InvoiceXJobMaster_Invoices Foreign Key (InvoicePK) References dbo.Invoices (InvoicePK) , Constraint FK_InvoiceXJobMaster_JobMaster Foreign Key (JobPK) References dbo.JobMaster (JobPK) ); Create Table dbo.OverheadMaster( OverheadPKint primary key clustered, OverheadDescriptionvarchar(100) not null );
Create Table dbo.OverheadRates( OverheadRatesPKint identity primary key clustered, OverheadPKintnot null, OverheadHourlyRatemoney not null, EffectiveDatedatetimenot null, UseEmployeeFlagbit not null, UseContractorFlagbit not null, Constraint FK_OHRates_OHMaster Foreign Key (OverheadPK) References dbo.OverheadMaster (OverheadPK) ); Create Table dbo.PurchaseTypes( TypeCodeint primary key clustered, CodeDescriptionvarchar(100) ); INSERT INTO dbo.PurchaseTypes (TypeCode, CodeDescription) VALUES (1, 'Regular Materials'),(2,'Petty Cash'),(3,'Fuel'); Create Table dbo.MaterialPurchases( MaterialPurchasesPKint primary key clustered, JobPKintnot null, TypeCodeintnot null, PurchaseAmountmoney not null, PurchaseDatedatetime, PurchaseSalePointvarchar(100) null, Constraint FK_MatPurch_JobMaster Foreign Key (JobPK) References dbo.JobMaster (JobPK), Constraint FK_MatPurch_PurchTypes Foreign Key (TypeCode) References dbo.PurchaseTypes (TypeCode) ); Note: the dbo.PurchaseTypes table did not have any source data; the only three purchase types were known but never incorporated into the AllWorks, Inc. records. Therefore, the dbo.PurchaseTypes table was created and populated in T-SQL script; this is the onlyAllWorksOLTP database table not being populated by the SSIS ETL project file.
SSIS Project File (SSISStudentProject.JeffJacob.sln) details of common variables, source files and error log files: • Common Variables for ETL Packages - SSIS project: • Run-time Variables • changedRowCounter (rows updated) • newRowCounter (rows inserted) • errorRowCounter (FK-violating rows written to error log) • unchangedRowCounter (redundant/unused rows) • Configuration Variables • Variable NameDefault (debugging) Value • DBServerlocalhost • smtpServer sfexch0003 • EmailRecipient jeffrey.jacob@setfocus.com • SOURCE FILES: • TimeSheetFiles (Pkg: TimeSheets [uses CSV files]) • EmpTime*.csv • EmployeeExcelSourceFile (Pkgs: EmployeeMaster, EmployeeRates) • Employees.xlsx • OverheadExcelSourceFile (Pkgs: OverheadMaster, OverheadRates) • Overhead.xlsx • ClientGeographiesExcelSourceFile (Pkgs: CountyMaster, ClientMaster) • ClientGeographies.xlsx • ProjectMasterExcelSourceFile (Pkg: JobMaster) • ProjectMaster.xlsx • MaterialPurchasesSourceFile [uses XML file] (Pkg: Material Purchases) • MaterialPurchases.xml / .xsd (no Connection Variable) • InvoicesExcelSourceFile (Pkgs: Invoices, InvoiceXJobMaster, InvoiceReceipts) • Invoices.xlsx • ERROR LOG FILES: • (None - no FKs) (Pkgs: EmployeeMaster, OverheadMaster, CountyMaster) • no FK violations possible • EmployeeRatesErrorLogFile (Pkg: EmployeeRates) • EmployeeRatesErrorLog.txt • OverheadRatesErrorLogFile (Pkg: OverheadRates) • OverheadRatesErrorLog.txt • ClientMasterErrorLogFile (Pkg: ClientMaster) • ClientMasterErrorLog.txt • JobMasterErrorLogFile (Pkg: JobMaster) • JobMasterErrorLog.txt • TimeSheetsErrorLogFile (Pkg: TimeSheets) • TimeSheetsErrorLog.txt • MaterialPurchasesErrorLogFile (Pkg: MaterialPurchases) • MaterialPurchasesErrorLog.txt • InvoiceErrorLogFile (Pkg: Invoices) • InvoicesErrorLog.txt • InvoiceXJobMasterErrorLogFile (Pkg: InvoiceXJobMaster) • InvoicesXJobMasterErrorLog.txt • InvoiceReceiptsErrorLogFile (Pkg: InvoiceReceipts) • InvoiceReceiptsErrorLog.txt
SSIS Project File (SSISStudentProject.JeffJacob.sln) details on project naming convention (ETL packages only): Project Naming Convention– Control Flow and Data Flow Objects: ETL Package Names <Function/Target Table> DataLoad (.dtsx) ETL Control Flow Objects Object Type Name Data Flow Task Read <data description> into (AllWorksOLTP) DB Send Mail Task Email successful (unsuccessful) ETL processing of <DB target table> Script Task Collect count of items processed (TimeSheetsDataLoad ETL package only) File System Task Delete Error File From File System (TimeSheetsDataLoad ETL package only) Foreach Loop Container Loop through TimeSheet data source files (TimeSheetsDataLoad ETL package only) ETL Data Flow Objects Object Type Name Excel Source Get Excel source data Flat File Source Get TimeSheet CSV files XML Source Get Material Purchases data Data Conversion Convert data to DB-compatible data types Lookup Transformation (FKs) Validate incoming FK values against <parent table> Lookup Transformation (PKs) Validate incoming PK values against <parent table> Conditional Split (update check) Split out real update rows from redundant rows Conditional Split (error split) Remove errors from main pipeline Row Count Transformation (new) Count new rows Row Count Transformation (changed) Count changed rows Row Count Transformation (unchanged) Count unchanged rows Row Count Transformation (errors) Count error rows OLE DB Destination Insert new rows Audit Transformation Add Processing-info audit trail fields - package name and time of processing OLE DB Command Transformation Update <Target Table> data Flat File Destination Send errors to error log Derived Column Transformation (No naming convention – named according to usage)
SSIS Project File (SSISStudentProject.JeffJacob.sln) additional details on project naming convention (ETL packages only): Project Naming Convention– Data Flow columns : Data Conversion Field Lookup Column Field Derived Table Field Transformed<source field name> Target<source field name> (purpose-dependent: no naming convention) Samples of Data Conversion, Lookup and Derived Column Transformations in project:
Sample of Lookup task to check for Foreign Key violating data rows; such rows are errors and are counted and sent to an error file (below).
Sample of Lookup task to check Primary Key values in incoming data rows; new Primary Key values are new rows to be inserted and pre-existing Primary Key values are update candidates which are checked for redundancy; inserts and updates are split, counted and sent for DB Insert/Update commands respectively.
Sample of source data: Employees.xlsx Sample (SSMS) of AllWorksOLTP database : dbo.LaborMaster
Sample of source data: Employees.xlsx Sample (SSMS) of AllWorksOLTP database : dbo.EmployeeRates
Sample of source data: Overhead.xlsx Sample (SSMS) of AllWorksOLTP database : dbo.OverheadMaster
Sample of source data: Overhead.xlsx Sample (SSMS) of AllWorksOLTP database : dbo.OverheadRates
Sample of source data: ClientGeographies.xlsx Sample (SSMS) of AllWorksOLTP database : dbo.CountyMaster
Sample of source data: ClientGeographies.xlsx Sample (SSMS) of AllWorksOLTP database : dbo.ClientMaster
Sample of source data: ProjectMaster.xlsx Sample (SSMS) of AllWorksOLTP database : dbo.JobMaster
Sample of source data: EmpTime*.csv Sample (SSMS) of AllWorksOLTP database : dbo.TimeSheets
C# script to enable file and row count totals: MaterialPurchases data ETL package: Control Flow and Variables
Sample of source data: MaterialPurchases.xml Sample (SSMS) of AllWorksOLTP database : dbo.MaterialPurchases
MaterialPurchases data ETL package: Control Flow and Variables