530 likes | 691 Views
DBI322. Incremental ETL Using CDC for SQL and Oracle with SQL Server Integration Services (SSIS) 2012. Matt Masson Senior Program Manager Microsoft Corporation. Change Data Capture. Your First Data Warehouse. Challenges. More work to do. Less time to do it. More people using the system.
E N D
DBI322 Incremental ETL Using CDC for SQL and Oracle with SQL Server Integration Services (SSIS) 2012 Matt Masson Senior Program Manager Microsoft Corporation
Challenges More work to do Less time to do it More people using the system
Mo’ Data, Mo’ Problems • Processing time increases with data volume • Scales linearly (assuming no bottlenecks) Do Less Work!
Change Data Capture - How It Works • Enable it on the Source Database • Inserts, Updates, and Deletes are automatically tracked • Consume the changes
Demo CDC in SQL Server 2012
SQL Server CDC – Summary • Enable it on the source system • Automatic capture of changes • Configuration options • Uses transactional replication • SQL Server 2008 and later • -- enable CDC on the database • execsys.sp_cdc_enable_db • -- enable CDC on a table • execsys.sp_cdc_enable_table • @source_schema=N'dbo', • @source_name=N'MyTable', • @role_name=N'cdc_admin', • @supports_net_changes= 1
CDC Components for SSIS 2012 • Use before and after data flows to handle CDC state information • Read change data from source system • Uses CDC state information • Splits rows based on operation • Smart Conditional Split
Workflow Initial Load Incremental Load
Initial Load • Scenario 1: Active database • Mark Initial Load Start • Bulk load from source to destination • Mark Initial Load End • Scenario 2: Inactive database or using snapshot/flashback • Mark CDC start • Oracle – provide the System Change Number (SCN) • SQL – provide the Log Sequence Number (LSN) or snapshot name • CDC Control Task will use current value if none is provided • Bulk load data from source to destination
Incremental Load • Use CDC Control Task to get the processing range • Read from cdc_statestable • Stored in package variable • Use CDC Source to read changes since last run • Process change rows • Use CDC Control Task to mark processing end
Using the Reprocessing Indicator • Option in the CDC Source to include a reprocessing flag • Adds __$reprocessing column to data flow • Flag is set to True for rows that require special handling • Rows that overlap with initial load • Reprocessing a range after a failure
Demo CDC Components for SSIS
CDC Components - Summary • CDC Control Task • Retrieve and persist state • Use before and after your data flow • CDC Source • Reads change data from source table • CDC Splitter • Separate rows by operation type
Processing Modes • All • All with Old Values • Net • Net with Update Mask • Net with Merge
Which Processing Mode Should I Use? • All vs. Net • All changes • Do I need to record every change? • Do I enjoy complicated ETL work? • Net changes • Do I want to push the work to the source system? • Update Mask / Old Values • Do I need to do something when specific columns change?
Processing Mode - All Control Flow Data Flow
Processing Mode – All with Old Values Control Flow Data Flow
Processing Mode – Net Control Flow Data Flow
Processing Mode – Net with Update Mask Control Flow Data Flow
Processing Mode – Net with Merge Control Flow Data Flow
How it Works CDC Designer Oracle SQL Server CDC Tables LogMiner Database CDC Service Mirror Tables
Setup • Oracle CDC components are not installed by default • MSIs found on SQL media under: \Tools\AttunityCDCOracle • Oracle CDC Service • AttunityOracleCdcService.msi • Oracle CDC Designer • AttunityOracleCdcDesigner.msi
Oracle CDC Service Configuration • Name your service • Provide a service account • SQL instance information • CDC master password • One service per Oracle DB
Oracle CDC Designer • Allows you to create and manage Oracle CDC Instances • You can have multiple instances per CDC Service • An instance contains • Oracle database connection information • Tables and columns being tracked • Mirroring SQL Server instance information
Oracle CDC Designer – Managing an Instance • List all of your services and CDC capture instances • Control the instance state (start, stop, reset) • Regenerate setup scripts
Supported Versions • SQL Server (Enterprise Edition) • 2008, 2008 R2 and 2012 • Oracle • 10g Release 2 • 10.2.0.1—10.2.0.5 (patch set as of April 2010) • 11g Release 1 • 11.1.0.6—11.1.0.7 (patch set as of September 2008) • 11g Release 2 • 11.2.0.1—11.2.0.2 (patch set as of November 2010)
Common Questions • Do I need any special permissions? • Getting current LSN during initial load requires db_owner • Call made to sys.sp_replincrementlsn • Will enabling CDC impact performance? • Similar overhead to transactional replication • Asynchronous processing reduces overall impact • Increases data size • Prevents minimally logged operations • Less than 10% overhead with normal workload (and no IO issues)
Books Online Resources • CDC Flow Components http://msdn.microsoft.com/en-us/library/hh231087.aspx • Tuning the Performance of Change Data Capture http://msdn.microsoft.com/en-us/library/dd266396.aspx • Comparing Change Data Capture and Change Tracking http://msdn.microsoft.com/en-us/library/cc280519.aspx
Required Slide *delete this box when your slide is finalized Speakers, please list the Breakout Sessions, Labs, Demo Stations and Certification Exams that relate to your session. Also indicate when they can find you staffing in the TLC. Related Content • Breakout Sessions • DBI310: EIM: Bringing Together SSIS, MDS and DQS • DBI329: Upgrading SSIS Packages to SQL Server 2012 • DBI207: BI Power Hour Hands-on Labs • DBI24-HOL: Exploring Microsoft SQL Server Integration Services Product Demo Stations Breakthrough Insights: Credible, Consistent Data Related Certification Exam Exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012 Find Me Later At… Breakthrough Insights: Credible, Consistent Data
Contact mattmasson.com mmasson@microsoft.com @mattmasson Matt Masson
Track Resources Hands-On Labs @sqlserver @ms_teched SQL Server 2012 Eval Copy Get Certified! mva Microsoft Virtual Academy
Resources Learning TechNet • Connect. Share. Discuss. • Microsoft Certification & Training Resources http://northamerica.msteched.com www.microsoft.com/learning • Resources for IT Professionals • Resources for Developers • http://microsoft.com/technet http://microsoft.com/msdn
Required Slide Complete an evaluation on CommNet and enter to win!
MS Tag Scan the Tag to evaluate this session now on myTechEd Mobile
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.