Written by Aaron CrouchFrom a Business Intelligence and Data Warehousing point of view, the upcoming release of SQL Server 2008 presents us with several new tools designed to aid the B.I. developer with loading, maintaining and reporting on dimensional data warehouses. Though much of the functionality is new to the development world as a whole, many ideas were borrowed from other successful DBMS s and B.I. entities. I, myself, started out my development life in the land of Oracle where King Ellison and his subjects worked closely with some of the fathers of modern data warehousing to make tangible many of their great ideas. Quite honestly these ideas were leaps and bounds ahead of any other mainstream DBMS on the market, including Microsoft s new SQL Server 7. Technologies such as materialized views, table based partitioning, and Rollup and Cube aggregations directly embedded into the SQL worked great but weren t the easiest technologies to implement thanks to Oracle s apparent die-hard belief that all things technological must be incredibly unintuitive. Nevertheless, after making the jump into SQL Server (2000), I found myself longing for many of these great tools again and wondering when Microsoft would get on the B.I. ball. When SQL Server 2005 was introduced, my excitement level with Microsoft DBMS s sky-rocketed and many of the missing tools were now back in play for me. However, one technology that impressed me from the Oracle days, automated change data capture, was still missing. Enter SQL Server 2008, and yes, I have change data capture again! But wait I still do most of my database development in a SQL Server 2005 environment now what? There are several shrink wrapped packages that can track changed data in a SQL 2005 environment, most of which rely on monitoring the transaction logs. These applications tend to be expensive, though, especially for smaller businesses. The benefit of this approach is that, for the most part, database performance is not impacted. But what happens when the data files are reorganized through truncation or shrinking? This is a fairly standard practice exercised by most DBA s for the purpose of optimization. Yet, should the shrinkage take place before the CDC utility has a chance to scan the logs, much of the change data could be lost before it s utilized. Another alternative, and the one which I will demonstrate in this entry, relies on triggers.triggers can be inherently dangerous. However, used correctly they can be of great aid.the same can be said of cursors and dynamic SQL, all of which, if used incorrectly, can cause optimization issues, but all of which I actually employ in the following technique.are there better techniques? Depending on your circumstances, I imagine so, though I have found that this one works very well in practice and have used it multiple times with success. First, we ll want to identify which tables will need to be tracked. Obviously this approach introduces a certain amount of overhead so the fewer the better. These candidate tables will eventually have the logging triggers applied to them. Next we ll want to set up the table whose responsibility it is to log the changed data. The table will look like this: CREATE TABLE [audit].[audit]( [Audit_PK] [bigint] IDENTITY(1,1) NOT NULL, [TableName] [varchar](50) NOT NULL, [ColumnName] [varchar](50) NOT NULL, [RecordPKey] [bigint] NOT NULL, [RecordAction] [tinyint] NOT NULL, [NewValue] [varchar](50) NOT NULL, 1 / 6
[OldValue] [varchar](50) NULL, [DeletedRecord] [varchar](1000) NOT NULL, [LastActionDate] [datetime] NOT NULL, [LastActionUser] [varchar](50) NOT NULL ) ON [PRIMARY] Now that our auditing data repository is created, we need to develop the trigger(s) that will reside on the table(s) in which we want to capture changed data. This trigger will need to handle Insert, Update, and Delete events as we want to track all 3. CREATE TRIGGER [dbo].[tblmain_billtoauditiud] ON [dbo].[tblmain_billto] FORINSERT,UPDATE,DELETE AS Declare Variables DECLARE @ParentTable VARCHAR(50) DECLARE @PriKey VARCHAR(50) DECLARE @ColumnsUpdated VARBINARY(100) DECLARE @ColUPD VARCHAR(50) DECLARE @LastUpdateDate VARCHAR(20) DECLARE @LastUpdateUser VARCHAR(50) DECLARE @sql VARCHAR(1000) DECLARE @EventType TINYINT We will want to capture the name of the user as well as the date in which they fired off the DML against our tracked table. This information is really only useful for logging and auditing purposes so if we re watching this table strictly for gathering CDC data, we can leave this bit of logic out. SET @LastUpdateDate = CAST(GETDATE() AS VARCHAR(20)) SET @LastUpdateUser = SYSTEM_USER Notice that we re changing the output of the GETDATE() function to a string. This is, again, because we ll need to incorporate it into a dynamic SQL statement which we will be building along the way. Next up, we ll want to identify which table fired off the trigger and figure out what column the Primary Key is. For the purpose of this example, we ll assume that the Primary Key is a single identity field but this can be altered for different situations and needs. SELECT @ParentTable = object_name(parent_obj) FROM sysobjects WHERE id = @@procid 2 / 6
SELECT @PriKey = kcu.column_name FROM Information_Schema.Table_Constraints tc INNER JOIN Information_Schema.Key_Column_Usage kcu ON tc.constraint_name = kcu.constraint_name WHERE tc.constraint_type = PRIMARY KEY AND tc.table_name = @ParentTable Now we need to know what type of DML just took place, whether an insert, update, or a delete action. To do this we ll have to look at SQL s Inserted and Deleted memory resident tables. To figure out which action took place we ll check for the existence of records in both of these tables. If records exist in the Inserted table but not in Deleted, then we can surmise that an insert action took place. If there are records in the Deleted table but not the Inserted table, then a record deletion occurred. Following, if records are present in both tables, then we know that an update statement was issued. IF EXISTS(SELECT * FROM Inserted) IF EXISTS(SELECT * FROM Deleted) SELECT @EventType = 0 ELSE SELECT @EventType = 1 ELSE SELECT @EventType = 2 So, if an insert took place, build an Insert statement to process our new changed data information into our audit table. In this case, we store the newly assigned Primary Key so that we can look it up in the source table if it becomes necessary. IF @EventType = 1 Pull contents of Inserted pseudo table into temp tables for use in dynamic SQL SELECT * INTO #ins FROM Inserted Create and execute Audit table insert statement SET @sql = INSERT INTO dbo.audit SELECT + @ParentTable +, -New Record-, pt. + @PriKey +, 1, -New Record-, -New Record-, N/A, + @LastUpdateDate +, + @LastUpdateUser + FROM + @ParentTable + pt INNER JOIN #ins2 ins ON pt. + @PriKey + = ins. + @Prikey EXEC (@sql) 3 / 6
We must also check for a deletion action with the following code. Note that, for auditing purposes, we have captured the entire contents of the deleted record as XML. This allows us to use the same trigger logic on any table and to capture and store the entire deleted record without altering our CDC/audit repository table. This can then be queried with simple xquery syntax later on. IF @EventType = 2 Pull contents of Deleted pseudo table into temp tables for use in dynamic SQL SELECT * INTO #dlt FROM DELETED Create and execute Audit table delete statementset @sql = INSERT INTO dbo.audit SELECT + @ParentTable +, -Record Deleted-, dlt. + @PriKey +, 2, -Record Deleted-, -Record Deleted-, (SELECT * FROM #dlt2 Where #dlt2. +@PriKey + = dlt. + @Prikey + FOR XML AUTO), + @LastUpdateDate +, + @LastUpdateUser+ FROM #dlt2 dlt EXEC (@sql) Last we need to check for any columns that have been updated. This presents a bit more difficulty but thanks to a few not so well known T-SQL functions we can take care of it relatively easily. Once we identify that an update has indeed taken place, we have to figure out which column or columns were updated and then get the old and new values to place in our changed data/audit table. The Columns_Updated function returns a bit map indicating which columns were updated.what that means is that it will return a single byte representing the 1st eight columns in a table 1 bit per column. If a column is updated, its mapped bit is flipped to a 1. So, as an example, if I have a table with 4 columns and the Columns_Updated function returns 100100 we know that the first and fourth columns have been updated. Next we ll need to decode our bitmap. For this we ll use the fn_isbitsetinbitmask system function. By passing the generated bit mask in as well as the table name we re looking at, it will return the column names. These column names are then fed into a cursor where we loop through them to create the insert statements for the audit/cdc table 1 record for each updated column. We ll capture the old and new values as well as the primary key for the updated record. IF @EventType = 0 Prepare the COLUMNS_UPDATED function SET @ColumnsUpdated = COLUMNS_UPDATED() Pull contents of Inserted and Deleted pseudo tables into temp 4 / 6
tables for use in dynamic SQL SELECT * INTO #ins FROM Inserted SELECT * INTO #dlt FROM Deleted Loop through each column getting updated DECLARE ccols CURSOR READ_ONLY FOR SELECT Column_Name FROM Information_Schema.Columns Field WHERE Table_Name = @ParentTable AND sys.fn_isbitsetinbitmask (@ColumnsUpdated, COLUMNPROPERTY(OBJECT_ID(Table_Schema +. + Table_Name), Column_Name, ColumnID )) <> 0 OPEN ccols FETCH NEXT FROM ccols INTO @ColUPD WHILE @@FETCH_STATUS = 0 Create and execute Audit table insert statement SET @sql = INSERT INTO dbo.audit SELECT + @ParentTable +, + @ColUPD +, pt. + @PriKey +, 0, ins. + @ColUPD +, dlt. + @ColUPD +, N/A, +@LastUpdateDate +, + @LastUpdateUser + FROM + @ParentTable + pt INNER JOIN #ins ins ON pt. +@PriKey + = ins. + @Prikey + INNER JOIN #dlt dlt ON pt. + @PriKey + = dlt. + @Prikey EXEC (@sql) FETCH NEXT FROM ccols INTO @ColUPD CLOSE ccols DEALLOCATE ccols We are now capturing any and all updates, deletes, and inserts against our tracked table.to use this approach for change data capture, we would simply need to carry the keys of the altered records into our star schema. We could then apply the DML here and finally truncate/archive our CDC table, getting it ready for the next round of changes.alternatively if we re using this approach for logging and auditing, we would simply continue to amass the information and 5 / 6
present it via reports, etc. An additional technique I ve employed, in conjunction with the above, is to create a stored procedure which will build the triggers automatically when passed a table name.this is helpful when tracking changed data from many tables. 6 / 6