510 likes | 650 Views
T-SQL Tools:. Simplicity for Synchronizing Changes Martin Perez. T-SQL Tools: Simplicity for Synchronizing Changes Abstract: One of most common problems a data professional may face is finding and syncing data changes efficiently between multiple systems.
E N D
T-SQL Tools: Simplicity for Synchronizing Changes Martin Perez
T-SQL Tools: Simplicity for Synchronizing Changes Abstract: One of most common problems a data professional may face is finding and syncing data changes efficiently between multiple systems. In this session we will look into different methods focusing on effective use of T-SQL and explore how these methods can mixed and matched in your workflows. Target Audience: Any DBA or developer that has ever faced the problem of moving data changes across databases.
The Story We have a third party application critical to the company. Any changes we make to the database means we are now on an “unsupported configuration”. No table changes, no new indexes. Also, there is no consistent “updated date time” data for the tables we need. Querying the online system causes noticeable blocking. We could not get data in the window for daily reporting. Brute force – Scan all
Good News: A combination of differential database restore to read only standby with full table scans to find new and updated rows. By restoring the database to a reporting server with more cores and faster storage than the leased server housing the application database, we could brute force the table compare and run multiple queries in parallel to find new and updated rows to feed into the data warehouse. The simplistic approach met the needs and needed relative little work to maintain once the initial work was done. The DBAs were able to set up and schedule the back and restore jobs and the BI Developers were able to get the data. No one was a bottle-neck for the other. Brute force – Scan all
The Downside: Backup, restore and file copy all take time and need to be accounted for. Your storage and network speeds can easily become a limiting factor if you have a tight window for reporting. As table size increases the time needed to read all data can become unreasonable. Brute force – Scan all
The Upside: Relatively easy to implement using well understood technology and existing infrastructure. Easy to support, DBAs and developers could each do the work they best prepared for and it provided immediate answers to the business. It was relatively low-tech but provided immediate value. Brute force – Scan all
Recap: Dual table scans Compare every column Watch for datatype mismatch Handle NULL Values Collation mismatch Case-Sensitive v Case-Insensitive Batch your deletes Brute force
Recap: • NOT IN • Primary Key • Indexed column • Can be efficient • NOT EXISTS • Good for comparing multiple columns • OUTER JOIN • Find new and deleted rows • EXCEPT • Good for comparing multiple columns Brute force
USE StackOverflow; SELECT Id FROM dbo.Badges WHERE Id NOTIN ( SELECT Id FROM the_target.dbo.BadgesCopy ); GO Syntax – NOT IN– NOT IN
SELECT Id FROM dbo.Badges AS s WHERENOTEXISTS ( SELECT t.Id FROM the_target.dbo.BadgesCopy AS t WHERE t.Id=s.Id ); Syntax – not exists
SELECT s.Id FROM dbo.Badges AS s LEFTOUTERJOIN the_target.dbo.BadgesCopy AS t ON s.Id=t.Id WHERE t.Id ISNULL; Syntax– OUTER JOIN
Similar execution plans • Index or table scan • Large tables, problematic performance
Rowversion Change Tracking Change Data Capture Temporal Tables SQL Server for the assist
Can narrow the range of rows to compare dramatically Additional programming and maintenance overhead Rowversion Change Tracking Change Data Capture Temporal Tables SQL Server for the assist
Notpictured SQL Server Replication Triggers Third-Party Tools Hashbytes
Each database has a counter that is incremented for each insert or update operation that is performed on a table that contains a rowversion column within the database. This counter is the database rowversion. https://docs.microsoft.com/en-us/sql/t-sql/data-types/rowversion-transact-sql Rowversion
Alias for timestamp Can be added to existing table Watch out for transform times Rowversion
One rowversion column per table Value increments with each Insert, Update operation, per row Not guaranteed to be unique Not recommended for Primary Key or Clustered Index “No-op” update increments the value Rowversion
Efficient query based on increasing values Column can be indexed Rowversion
Finding deleted rows is still a brute force operation Rowversion
System maintained Unique per database (mostly) Easily find range of modified rows Easily find new rows Does not help tracking deletes No-op (update column to itself) increments row version One rowversion column allowed per table Rowversion RECAP
An option since SQL Server 2018 is to enable change tracking. Once this is enabled at the database level we get an incrementing row version and the type of operation, insert update or delete. Deletes are available as separate operations and we have the list of columns that have changed per operation. Change tracking
Enable/Disable change tracking Set retention period Set cleanup behavior Change tracking Database setup
Enable for each table required • Set option for column changes Change tracking Table setup
Usage • Query change table function • SYS_CHANGE_OPERATION • CHAR(1) for Insert, Update, Delete operations • SYS_CHANGE_COLUMNS • VARBINARY(4100) • Bit-mask for updated columns • SYS_CHANGE_VERSION • BIGINT Incremental version Change tracking
Track columns updated allows gives us more details about update operations • Enables the update mask Change tracking Table setup
Create targeted updates Batch updates based on column groupings Change tracking
Does not require Enterprise Edition Additional storage overhead Keep track of retention period No supported way of cleaning up data manually Understand how insert and update results change based on change version Keep track of KB Articles related to reported Change Tracking bugs Change tracking recap
The Story: You need a bit more of the story as data changes. Deltas matter and maybe you want to audit the before and after values. Maybe you want to monitor for specific values as the data is being changed. Maybe you want to stream these changes to your data warehouse in near-real-time. Change data capture
Scenario: ERP system configuration changes were being made that caused time consuming closing process to fail. As year-end loomed, getting to the bottom of things became critical. Although the application had built-in auditing, the audit process added overhead and the output was not easily to read. Any solution put into place could not impact day-to-day operations. Change data capture
Good News: If you are at least SQL Server 2008 Enterprise Edition or SQL Server 2016 Service Pack 1 Standard Edition, Change Data Capture may be the solution you’re looking for. Change data capture
Prior to SQL Server 2016 SP1, requires Enterprise Edition Up to two capture instances per table Stores before and after values per column Lot’s of flexibility (pro) Lot’s of flexibility (con) Change data capture
Separate file group is recommended Additional IO overhead Make sure CDC SQL Agent Jobs are running Separate OLTP database files from Audit, Integration, ETL… Security for tracked data Restore WITH KEEP_CDC to preserve change data CDC setup Change data capture
The Upside: Track before and after values Insert update and delete actions fully logged and available Change data can be read without hitting transactional tables Full control over retention length Multiple strategies available to maximize performance Change data capture
The Downside: As you bring this solution into the mix you’ll find there’s More to Manage™. Keeping track of storage needs require your attention and to take full advantage of the solution some custom development may be in order. Use NCHAR or NVARCHAR data type for columns containing non-ASCII data could cause problems, code or table changes may be still needed. Change data capture
Capture instanceRecommended Name the capture instance Test performance with net change support enabled Index must be unique Only include required columns Monitor rate of change for data Keep CDC data in its own filegroup Change data capture
Creates system table per capture instance System columns are prefixed with “__$” Contains before and after values for updates Row is copied for deletes __$command_id – order of operation in a transaction Change data capture
LSN can be set by a date range Change data capture
New things to monitor Watch Transaction log growth Watch cleanup job performance Cleanup window and data loads need to be coordinated Change data capture
Additional IO overhead Requires more programming effort Requires more administration More options to limit interference with OLTP workloads Flexible (pro) Flexible (con) Available in SQL Server 2016 SP1 Standard Change data capture recap
Available in SQL Server 2016 System-versioned Designed to keep a full history of data changes Temporal Tables
Period start column added Period end column added New syntax for time-traveling queries Minimal maintenance compared to custom code “No-op” updates increments system time Temporal Tables
AS OF <datetime> FROM <start> TO <end> BETWEEN <start> AND <end> CONTAINED IN (<start>, <end>) Temporal Tables – FOR SYSTEM TIME
“Warm standby” Restore full/differential Read-only standby (no locks) Mitigate load on production system Combine withrowversion, change table, cdcetc… Combining Techniques