Database design for audit logging

jbochi picture jbochi · Jan 6, 2010 · Viewed 84.7k times · Source

Every time I need to design a new database I spend quite some time thinking on how I should set up the database schema to keep an audit log of the changes.

Some questions have already been asked here about this, but I don't agree that there is a single best approach for all scenarios:

I have also stumbled upon this interesting article on Maintaining a Log of Database Changes that tries to list the pro and cons of each approach. It's very well written and has interesting information, but it has made my decisions even harder.

My question is: Is there a reference that I can use, maybe a book or something like a decision tree that I can refer to decide which way should I go based on some input variables, like:

  • The maturity of the database schema
  • How the logs will be queried
  • The probability that it will be need to recreate records
  • What's more important: write or read performance
  • Nature of the values that are being logged (string, numbers, blobs)
  • Storage space available

The approaches that I know are:

1. Add columns for created and modified date and user

Table example:

  • id
  • value_1
  • value_2
  • value_3
  • created_date
  • modified_date
  • created_by
  • modified_by

Major cons: We lose the history of the modifications. Can't rollback after commit.

2. Insert only tables

Table example:

  • id
  • value_1
  • value_2
  • value_3
  • from
  • to
  • deleted (Boolean)
  • user

Major cons: How to keep foreign keys up to date? Huge space needed

3. Create a Separate history table for each table

History table example:

  • id
  • value_1
  • value_2
  • value_3
  • value_4
  • user
  • deleted (Boolean)
  • timestamp

Major cons: Needs to duplicate all audited tables. If the schema changes it will be needed to the migrate all the logs too.

4. Create a Consolidated history Table for All Tables

History table example:

  • table_name
  • field
  • user
  • new_value
  • deleted (Boolean)
  • timestamp

Major cons: Will I be able to recreate the records (rollback) if needed easily? The new_value column needs to be a huge string so it can support all different column types.

Answer

Josh Anderson picture Josh Anderson · Jan 7, 2010

One method that is used by a few wiki platforms is to separate the identifying data and the content you're auditing. It adds complexity, but you end up with an audit trail of complete records, not just listings of fields that were edited that you then have to mash up to give the user an idea of what the old record looked like.

So for example, if you had a table called Opportunities to track sales deals, you would actually create two separate tables:

Opportunities
Opportunities_Content (or something like that)

The Opportunities table would have information you'd use to uniquely identify the record and would house the primary key you'd reference for your foreign key relationships. The Opportunities_Content table would hold all the fields your users can change and for which you'd like to keep an audit trail. Each record in the Content table would include its own PK and the modified-by and modified-date data. The Opportunities table would include a reference to the current version as well as information on when the main record was originally created and by whom.

Here's a simple example:

CREATE TABLE dbo.Page(  
    ID int PRIMARY KEY,  
    Name nvarchar(200) NOT NULL,  
    CreatedByName nvarchar(100) NOT NULL, 
    CurrentRevision int NOT NULL, 
    CreatedDateTime datetime NOT NULL

And the contents:

CREATE TABLE dbo.PageContent(
    PageID int NOT NULL,
    Revision int NOT NULL,
    Title nvarchar(200) NOT NULL,
    User nvarchar(100) NOT NULL,
    LastModified datetime NOT NULL,
    Comment nvarchar(300) NULL,
    Content nvarchar(max) NOT NULL,
    Description nvarchar(200) NULL

I would probably make the PK of the contents table a multi-column key from PageID and Revision provided Revision was an identity type. You would use the Revision column as the FK. You then pull the consolidated record by JOINing like this:

SELECT * FROM Page
JOIN PageContent ON CurrentRevision = Revision AND ID = PageID

There might be some errors up there...this is off the top of my head. It should give you an idea of an alternative pattern, though.