What is the best architecture for tracking field changes on objects?

Nicole picture Nicole · Jan 25, 2010 · Viewed 7.3k times · Source

We have a web application that is built on top of a SQL database. Several different types of objects can have comments added to them, and some of these objects need field-level tracking, similar to how field changes are tracked on most issue-tracking systems (such as status, assignment, priority). We'd like to show who the change is by, what the previous value was, and what the new value is.

At a pure design level, it would be most straightforward to track each change from any object in a generic table, with columns for the object type, object primary key, primary key of the user that made the change, the field name, and the old and new values. In our case, these would also optionally have a comment ID if the user entered a comment when making the changes.

However, with how quickly this data can grow, is this the best architecture? What are some methods commonly employed to add this type of functionality to an already large-scale application?

[Edit] I'm starting a bounty on this question mainly because I'd like to find out in particular what is the best architecture in terms of handling scale very well. Tom H.'s answer is informative, but the recommended solution seems to be fairly size-inefficient (a new row for every new state of an object, even if many columns did not change) and not possible given the requirement that we must be able to track changes to user-created fields as well. In particular, I'm likely to accept an answer that can explain how a common issue-tracking system (JIRA or similar) has implemented this.

Answer

Tom H picture Tom H · Jan 25, 2010

There are several options available to you for this. You could have audit tables which basically mirror the base tables but also include a change date/time, change type and user. These can be updated through a trigger. This solution is typically better for behind the scenes auditing (IMO) though, rather than to solve an application-specific requirement.

The second option is as you've described. You can have a generic table that holds each individual change with a type code to show which attribute was changed. I personally don't like this solution as it prevents the use of check constraints on the columns and can also prevent foreign key constraints.

The third option (which would be my initial choice with the information given) would be to have a separate historical change table which is updated through the application and includes the PK for each table as well as the column(s) which you would be tracking. It's slightly different from the first option in that the application would be responsible for updating the table as needed. I prefer this over the first option in your case because you really have a business requirement that you're trying to solve, not a back-end technical requirement like auditing. By putting the logic in the application you have a bit more flexibility. Maybe some changes you don't want to track because they're maintenance updates, etc.

With the third option you can either have the "current" data in the base table or you can have each column that is kept historically in the historical table only. You would then need to look at the latest row to get the current state for the object. I prefer that because it avoids the problem of duplicate data in your database or having to look at multiple tables for the same data.

So, you might have:

Problem_Ticket (ticket_id, ticket_name) Problem_Ticket_History (ticket_id, change_datetime, description, comment, username)

Alternatively, you could use:

Problem_Ticket (ticket_id, ticket_name) Problem_Ticket_Comments (ticket_id, change_datetime, comment, username) Problem_Ticket_Statuses (ticket_id, change_datetime, status_id, username)