280 likes | 433 Views
BDBI Webcast Radio. The B aker’s D ozen B usiness I ntelligence Webcast Radio 13 SQL Server -Business Intelligence Productivity Tips Kevin S. Goff Microsoft SQL Server MVP . January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction). BDBI Radio.
E N D
BDBI Webcast Radio The Baker’s Dozen Business Intelligence Webcast Radio 13 SQL Server -Business Intelligence Productivity Tips Kevin S. Goff Microsoft SQL Server MVP January 15, 2014 13 Tips for using SQL Server Change Data Capture (Introduction)
BDBI Radio • Produced and Hosted by Kevin S. Goff, Microsoft SQL Server MVP and author of CoDe Magazine“Baker’s Dozen” Productivity Series from 2004-2013 • http://www.codemag.com/People/Bio/Kevin.Goff • Pre-recorded webcast – Available twice a month • Site: http://www.BDBIRadio.com • My overall blog/site: http://www.KevinSGoff.net • Program format: • 13 quick tips on different SQL/BI/SharePoint topics • In any week, tips might cover one topic (13 features in PowerPivot, 13 SQL Interview topics), or 13 random tips on anything from data warehousing to MDX/DAX programming, to maybe even a few .NET tips for scenarios when the .NET and BI worlds collide • (Email me at BDBIRadio@gmail.comif you have questions) 13 tips for SQL Server CDC (Intro)
BDBI Radio • Today: 13 tips for getting started with SQL Server Change Data Capture • Change Data Capture (CDC) was a new feature added to SQL Server 2008 Enterprise • A major feature added to the database engine to support “audit trail” capabilities • This is an introductory session, first of a two-part series • Next broadcast I’ll have a more advanced session 13 tips for SQL Server CDC (Intro)
BDBI Radio – Topics for today • Change Data Capture as a concept and why it’s important • History of capturing changes to data in SQL Server • How SQL Server CDC works and how it differs from past solutions • Configuring a database for CDC • Configuring tables for CDC • A simple CDC Demo with a table and some inserts, updates, deletes • Viewing the CDC Log tables • Querying the log tables • CDC Latency • Issue with CDC – capturing “who” made the change • Handling DDL Changes • Turning off SQL Server Agent • Limitations of Change Data Capture 13 tips for SQL Server CDC (Intro)
1 – Why Change Data Capture is important • For years, database developers/DBAs have looked for best way to… • Capture Changes that occur to data • Write out the changes to log (audit trail) tables • Log (Audit Trail) tables usually contain three pieces of information: • Who changed data • When was it changed • What was the change (i.e. Discount % changed from 5% to 7%) • In some industries, this functionality is absolutely critical • Need to know full history of changes – a customer profile might change 7 times in a year. Want to know each change, who made it, and when • Used in data warehousing systems – to feed a data warehouse with “what was added and what was changed” since the previous load • Challenge is capturing EVERY change – whether made by the application, by stored procedures, by import jobs, or manual updates in a query window in Management Studio 13 tips for SQL Server CDC (Intro)
1 – Why Change Data Capture is important • Just to illustrate that last point – on the challenge of capturing EVERY change – whether made by the application, by stored procedures, by import jobs, or manual updates in a query window in Management Studio Need mechanism that will capture what was changed, regardless of the original process Some solutions are more robust and more effective than others 13 tips for SQL Server CDC (Intro)
2 – History of capturing changes • Prior to SQL Server 2008, here were the solutions available: • Database Triggers: • PRO: They are “tried and true” and they work, regardless of update scenario • CONs: • Often requires large amount of code (some will generate code via scripts & metadata) • In some instances, they can adversely affect performance • Use of OUTPUT INTO statement added in SQL 2005 • PRO: Simpler than triggers – can redirect changes directly into INSERT/UPDATE statement • CONs: • Every DML statement on a table must be modified to include the OUTPUT INTO clause. Might not be practical in open-ended systems where UPDATES occur many ways • Implement logic in the application layer or manually in stored procs • CONs: same as with OUTPUT INTO (might not handle every update scenario) • Purchase third-party product (SQLAudit) • Debate on Build vs Buy, though some third-party products work well 13 tips for SQL Server CDC (Intro)
3 – How CDC works & how it differs • Found in SQL Server Enterprise, starting in 2008 • Requires the use of SQL Server Agent • Mechanism in the Database Engine • Captures insert/update/delete activity and posts the details to change tracking tables • Details include: • State of an inserted row, after insert • State of an updated row (both before and after update) • State of a row just before deletion • Change tracking log tables include all versions of the row! • Feature not enabled by default – must run a few system stored procedures to configure 13 tips for SQL Server CDC (Intro)
3 – How CDC works & how it differs • Database engine reads the transaction log asynchronously, using SQL Server Agent Jobs • Logs details associated with CRUD operations to change tracking tables • Change tracking tables contain complete version history of the rows • Note: CDC does not track “who” made the change – we need to handle that manually • CDC Jobs clean up change tracking logs rows that are 3+ days old 13 tips for SQL Server CDC (Intro)
4 – Configuring a database for CDC • SQL Server Agent must be ON • This might be a problem for organizations that manage database services using a Managed Services area – some of them might use other approaches for scheduling jobs and might not enable SQL Server Agent • Again, CDC requires SQL Server Agent • By default, CDC is not enabled on a database • To enable CDC on a database: • Exec sys.sp_cdc_enable_db 13 tips for SQL Server CDC (Intro)
5 – Configuring tables for CDC • We’ll create a simple table with three columns • Then we need to enable CDC for the table, using the system stored procedure exec sys.sp_cdc_enable_table • This will do two things: • Creates a change tracking table to store all versions of inserted & modified rows • Creates a job in SQL Server Agent to scan transaction log for DML statements • Again, SQL Server Agent MUST be turned on!!! • Next slide, we’ll see what SQL Server created Specify the schema, the table, and an optional role for user access to the change tracking table that will be created CDC creates 2 jobs in SQL Server Agent 13 tips for SQL Server CDC (Intro)
5 – Configuring tables for CDC Under system tables, we see the Change Tracking table for the base table TestEmployees. We’ll look at entries in the table in a few minutes Capture job runs system stored procedure sys.sp_MScdc_capture_job Cleanup job runs system stored procedure sys.sp_MScdc_cleanup_job 13 tips for SQL Server CDC (Intro)
6 – CDC Demo with inserts, updates, deletes • Insert two rows, then change a rate on one of the rows, then delete a row • We want to see the state of each row after the inserts, before and after each update, and when the delete occurs • Next step we’ll see what’s in the change tracking log 13 tips for SQL Server CDC (Intro)
7 – Viewing the CDC Log tables Each row represents state of the row for each insert, update (old state, new state) and delete • First two rows ($Operation = 2) represent the state of the rows after the insert (for the 2 new rows) • Second two rows ($Operation =3 and 4) represent the “old version of the row” and “new version of the row” when Katy’s rate was changed from 21.00 to 22.00 • The third row ($Operation = 1) represents the state of Katy’s row at the time of the deletion • We’ll look at the Log Sequence Number (_$start_lsn) later - this is important! • One thing missing: datetime of the update? 13 tips for SQL Server CDC (Intro)
8 – Querying the log tables • On the issue of the timestamp (datetime) not being in the log table • SQL Server stores the time in a separate table (cdc.lsn_time_mapping), with a reference to the log sequence number column from the change tracking table Time for all updates is stored in cdc.lsn_time_mapping We can join this to the Change tracking table on the LSN • Time mapping table holds the Begin and End time for each update • We can join this table to the Change tracking table by the Start_LSN column • This also holds the Transaction ID – helpful for debugging • Reason for storing this in separate table: transactions might span multiple base tables 13 tips for SQL Server CDC (Intro)
8 – Querying the log tables • Here’s the JOIN The transaction ID is useful for debugging “what actually happened” Remember, data only lives in the log tables for 3 days!!! 13 tips for SQL Server CDC (Intro)
9 – CDC Latency • After you insert/update/delete data from the core tables, you might not immediately see data in the change tracking logs • Because the CDC jobs scan the transaction log asynchronously, there is a slight period of delay/latency • A transaction that updates a million rows in five seconds might not necessarily see data in the change tracking logs for at least another five seconds • You can adjust these jobs for different levels of latency – I’ll cover that next webcast 13 tips for SQL Server CDC (Intro)
10 – Handling who made an update • Change Data Capture does not capture one potentially key piece of information: who made the change • Because CDC scans the transaction log asynchronously, after the transaction occurs, it cannot capture the user who made the change • We need to implement some kind of workaround 13 tips for SQL Server CDC (Intro)
10 – Handling who made an update • Run SSMS as a different user (SQLUser) • Run an update and change rows Even tough current user is SQLUser, an update still marks last user as original user (both in the Employees table and also the change tracking log) 13 tips for SQL Server CDC (Intro)
10 – Handling who made an update • But wait – when we created dbo.TestCustomer, didn’t we include this in the DDL? • That only works on INSERTS, not UPDATES • OK , since the Change Tracking Log always inserts new rows (even when original tables are updated), can’t we change the DDL for the Change Tracking Log and make the LastUser column use a default value for current user? • This is a great question, and often asked. • Unfortunately, when SQL Server CDC inserts a row into the Change Tracking table, the “current user” is the SQL Service admin account, not the current user • So, considering an update could happen many different ways, how can we get CDC to accurately capture the user who made the change??? 13 tips for SQL Server CDC (Intro)
10 – Handling who made an update • Only ways: • Explicitly specify the LastUser Column when the original table is originally updated • Might not always be possible or practical, if you have an open system where updates can occur many different ways • Database Trigger (ironically) to mark the LastUser in the original base table on a change – this will ripple through to the Change Tracking table Code accesses the INSERTED and DELETED system tables, which are “exposed” by the transaction log and represent the state of the row BEFORE (DELETED) and AFTER (INSERTED) the update Compare the “old value” and “new value” of each column whose change would warrant updating the Last User. This is the ONLY WAY to detect an actual change was made 13 tips for SQL Server CDC (Intro)
11 – Handling DDL Changes • Suppose you add a column to a core table • Will Change Data Capture automatically start logging that new column? • Unfortunately, no • We need to go through a series of steps: • Alter the core table to add the new column • Back up the change tracking table to another temp table • Disable CDC on that table (with a CDC stored procedure) • Re-enable CDC on that table (with a CDC stored procedure), which will add a reference to the new column (but will wipe out the data in the change tracking log, which is why we backed it up) • Select all of the data from the temp table (from step 2) back into the new change tracking table 13 tips for SQL Server CDC (Intro)
11 – Handling DDL Changes 13 tips for SQL Server CDC (Intro)
12 – Turning off SQL Server Agent • Suppose you turn off SQL Server Agent and then add rows to a table • Since the CDC scan job must be running in SQL Server Agent, turning Agent off will keep any logging from occurring • However, when you turn Agent back on, the job will run and “pick up” any changes from the transaction log that weren’t initially captured • Believe it or not, even if you attempt to truncate the transaction log in this period of time, any “uncaptured changes” will still be captured when Agent is turned back on! 13 tips for SQL Server CDC (Intro)
13 – Limitations of Change Data Capture • Only available in Enterprise Edition of SQL Server • Cannot truncate tables involved in Change Data Capture • Cannot be implemented on tables that do not have a primary key • Cannot use CDC on a table that has a columnstore index 13 tips for SQL Server CDC (Intro)
Next webcast: Advanced CDC • Tools in SSIS 2012 to work in conjunction with SSIS • Options for adjusting latency/delay of CDC scan jobs • Additional system stored procedures for using CDC • Performance benchmarks using CDC 13 tips for SQL Server CDC (Intro)
Some good links/references • http://www.codeproject.com/Articles/28535/Change-Data-Capture-SQL-Server-2008 • https://www.simple-talk.com/sql/learn-sql-server/introduction-to-change-data-capture-(cdc)-in-sql-server-2008/ 13 tips for SQL Server CDC (Intro)
Today’s Jazz Recommendation • Florian Hoefner Group • Great jazz quartet • Search YouTube, you’ll find some GREAT performances • http://www.amazon.com/Songs-Without-Words-Florian-Hoefner/dp/B00784620I/ref=sr_1_1?ie=UTF8&qid=1390110649&sr=8-1&keywords=florian+hoefner • http://www.amazon.com/Falling-Up-Florian-Hoefner-Group/dp/B00DSAUM5M/ref=sr_1_2?ie=UTF8&qid=1390111023&sr=8-2&keywords=florian+hoefner 13 tips for SQL Server CDC (Intro)