Blog About Me Posts By Tags Subscribe Elastic Search My Sessions Terraform

Pavan Kumar Aryasomayajulu


Auditing In SQL Server Using CDC Approach

Hi, In this post I would like to explain how we can audit tables in SQL server using Change data capture ( CDC) approach.

Requirement:

In the world of transactional databases, It is always required to audit our changes to track changes like what was changed and when it was changed especially with sensitve data. For example If we are dealing with some pharmacy or someother key domains related data, then as per govt regulations it is mandatory to audit certain data.

Traditional approaches to Audit Data

There many ways to audit data in Sql server, some traditional approaches are as follow:

  1. Using C# code itself Where ever we have insert and update statements , there we need to have some extra logic to insert into our tracking tables for changes.

         For example: Lets say I am trying to update "FirstName" field and In order to audit it I need to take following approach
    
                  -- Select FirstName From User where Id=10
                  -- Store this value in a variable
                  -- Update User set Firstname='New Value' where Id=10
                 -- Insert into User_Audit (Column,OldValue,NewValue,DateValue) Values ('FirstName','Old Value stored in variable','New Value',GetDate()
    

    So just to audit this change we need to write few extra lines of code and as the number of places to audit increases, this extra logic also increases which is a pain for maintaining.

  2. Using Triggers By making use of triggers, we can avoid this extra logic but there are few scenarios where triggers dont actually fire and we need some extra configuration. By default, the BULK INSERT statement do not execute triggers. However, you can enable triggers by using FIRE_TRIGGERS qualifiers. Also It is a manual process to write these triggers.

So, In these cases we can make use of SQL Server inbuilt feature for auditing.

Feature Availability:

CDC approach was indroduced in MS Sql 2008 but restricted only to developer and Enterprise editions. But from Sql server 2017, It is present in Standard Edition too.

Getting Started:

As CDC is specific to certain versions, initially lets check the version of sql server

 select @@version
 ---Output
Microsoft SQL Server 2017 (RTM) - 14.0.1000.169 (X64)   Aug 22 2017 17:04:49   Copyright (C) 2017 Microsoft Corporation  Express Edition (64-bit) on Windows 10 Pro 10.0 <X64> (Build 17134: ) (Hypervisor) 

I can see that my version of sql server in 2017 and Express Edition. So as per my understanding CDC feature is avaibale for Standard, Developer and Enterprise version of 2017 but not Express edition. But lets try



Check If CDC Is Enabled:

USE master 
GO 
SELECT [name], database_id, is_cdc_enabled  FROM sys.databases       
GO  



As you can see None of my databases have cdc enabled.

Enabling CDC: You can run this stored procedure in the context of each database to enable CDC at database level

	USE Pavan 
	GO 
	EXEC sys.sp_cdc_enable_db 
	GO  

When I ran this command, I got following error

Msg 22988, Level 16, State 1, Procedure sys.spcdcenabledb, Line 13 [Batch Start Line 2] This instance of SQL Server is the Express Edition (64-bit). Change data capture is only available in the Enterprise, Developer, Enterprise Evaluation, and Standard editions.


So only option I have now is to install standard or developer editions and I’ll go with developer edition for now because standard edition costs me.

After Installing Sql server devloper edition let me check the version

 select @@version
 ---Output
Microsoft SQL Server 2017 (RTM) - 14.0.1000.169 (X64)   Aug 22 2017 17:04:49   Copyright (C) 2017 Microsoft Corporation  Developer Edition (64-bit) on Windows 10 Pro 10.0 <X64> (Build 17134: ) (Hypervisor) 

Note: When we have 2 different versions installed, Please make sure that we connect to appropriate server

Now lets try to enable CDC by running following proc and this time it is successful.

	USE Pavan 
	GO 
	EXEC sys.sp_cdc_enable_db 
	GO  

	USE master 
	GO 
	SELECT [name], database_id, is_cdc_enabled  FROM sys.databases       
	GO  

Also we can see a new schema and few system level tables were created for cdc.



Check List Of Tables Having CDC

select is_tracked_by_cdc,* from sys.tables where is_ms_shipped=0

I have one table in my DB and we can see is_tracked_by_cdc is set to 0 ie cdc is not enabled on this table

Enable CDC

EXEC sys.sp_cdc_enable_table 
@source_schema = N'dbo', 
@source_name   = N'Test', 
@role_name     = NULL 

select is_tracked_by_cdc,* from sys.tables where is_ms_shipped=0

Here we need to pass schema and then source table name. role_name is used to restrict querying cdc tables based on roles and if we want this data to be accessible for everyone, we need to pass null for this value.


Enabling CDC for certain columns:


In case if we want to track only certain columns, then we need to pass them in the parameter
@captured_column_list = ‘[Firstname]’ to the proc sys.sp_cdc_enable_table

EXEC sys.sp_cdc_enable_table 
@source_schema = N'dbo', 
@source_name   = N'Test', 
@role_name     = NULL ,
@captured_column_list = '[Firstname]'

select is_tracked_by_cdc,* from sys.tables where is_ms_shipped=0


Now we can see few extra tables created in the system table



Testing CDC:

As we enabled CDC for our test table, now lets try to test how it is actually working. I’ll insert few records into Test table and see the result.

INSERT INTO [dbo].[Test] 
       (Id,Value) 
VALUES (1,'pavan') 


SELECT * FROM [Pavan].[cdc].[dbo_Test_CT]
select * from test
SELECT * FROM [Pavan].[cdc].[change_tables]

Unfortunately, I am not seeing any data in my CDC table which is opposite to my expectations



By looking at last query results, I can see that CDC is enabled on my “Test” table but still dbo_Test_CT table is not populated with any data.

I tried to identify errors by using these queries

exec sys.sp_cdc_help_change_data_capture
exec sys.sp_cdc_help_jobs
select * from sys.dm_cdc_errors



I see that there were no jobs for capture. And unfortunately I forgot to turn on my sql server agent.

Missing CDC capture Job when enabling CDC on table in SQL Server

The reason behind missing data in CDC tables is because this feature depends on Sql Jobs and by default sql jobs are in stopped state and we need to enable them.

Now to create these missging jobs, I need to run following statements and ultimately we can see these jobs created

EXEC sys.sp_cdc_add_job @job_type = N'capture';
EXEC sys.sp_cdc_add_job @job_type = N'cleanup';

Note By default if we have sql server agent running at the time of enabling cdc on a table, it will create these jobs and it is not required to run above scripts.

** Now lets test CDC again**

Insert

INSERT INTO [dbo].[Test] 
       (Id,Value) 
VALUES (3,'Satish') 

SELECT * FROM [Pavan].[cdc].[dbo_Test_CT]
select * from test

So I can see my inserted values in dbo_Test_CT table but not the latest inserted one… The reason for that is there is a slight deelay of 1 sec for the change to appear and here we are querying it along with insert.

There are two values which are very important to us is __$operation and __$update_mask.

Column _$operation contains value which corresponds to DML Operations. Following is quick list of value and its corresponding meaning.

Delete Statement = 1
Insert Statement = 2
Value before Update Statement = 3
Value after Update Statement = 4



Update

update test set value='Sai satish' where id=3

SELECT * FROM [Pavan].[cdc].[dbo_Test_CT]
select * from test


Row1 corresponds to insert (__operation=2) and we can see values for Id, Firstname,lastname
Row2,3 corresponds to update… row 2 has previous values and row 3 has updated values.. There I update firstname and lastname
Row 4,5 corresponds to update… Here we can see that Lastname in row 4 is null because, I am not updating it and only I updated Firstname

Understanding __update_mask:

In the above image for rows 2,3 we have update mask value as “0x06” and the binary equivalent of this is “0110”… So from this I can say that columns 2,3 were updated which are nothing but Firstname and Lastname
In the same way for rows 4,5 we have a value “0x02” and the binary equivalent is “0100”… So I can say column 1 is updated.



Identifying Records Modified In a Specific Transaction

Basically in tracking table we have data in the column __$start_lsn which is nothing but a transaction log sequence number

SELECT * FROM cdc.lsn_time_mapping 


This has info related to all transactions and it has date time

So In order to detect the time frame and group transactions in tracking table, we can use this query

select Id,Firstname,Lastname,t.* from  cdc.dbo_Users_CT u inner join
cdc.lsn_time_mapping t on u.__$start_lsn=t.start_lsn

Disabling CDC On Table:


USE Pavan; 
GO 
EXEC sys.sp_cdc_help_change_data_capture 
GO


By using above query, we will get different parameters required for disabling the CDC

USE Pavan;
GO
EXECUTE sys.sp_cdc_disable_table 
    @source_schema = N'dbo', 
    @source_name = N'Test',
    @capture_instance = N'dbo_Test';
GO


Once the CDC is disabled, It will delete all tracking tables too.


Deleting CDC on Database

USE Pavan 
GO 
EXEC sys.sp_cdc_disable_db 
GO 

Data Cleanup From CDC Tables

In a transactional database with good load, If we have this Change tracking enabled, there are chances of the database growing exponentially. So keeping that in mind, SQL server itself will clean these logs every 3 days but this is configurable.


SELECT ([retention])/((60*24)) AS Default_Retention_days ,*
FROM msdb.dbo.cdc_jobs


Initally when we created CDC on a table, it created 2 jobs. One for capture and the other for cleanup. So it is the responsibility of cleanup job to clean the data
sys.sp_cdc_cleanup_change_table


Updating this default 3 day value:

EXEC Pavan.sys.sp_cdc_change_job 
@job_type=N'Cleanup'
,@retention=20160 -- <60 min *24 hrs * 14 days >
go



Thanks,
Pavan Kumar Aryasomayajulu




#Audit_SqlServer #CDC #Tracking_Changes_In_SqlServer