How to create a data model for invoices in a large data warehouse?

davidjhp picture davidjhp · Oct 24, 2013 · Viewed 11.2k times · Source

I am creating a data model for customer invoices in a large data warehouse.

The following shows the fields on a typical invoice:

enter image description here


The following is the data model I worked out so far to model the invoices:

enter image description here

Conventional wisdom is that a large data warehouse should use a star schema, which means one fact table, but it seems that to model an invoice I would need two fact tables, as shown above. Would it be correct to use two fact tables?

Answer

Nick.McDermaid picture Nick.McDermaid · Oct 25, 2013

I recommend you avoid multiple grain fact tables where possible.

Since Invoice Fact contains Total Shipping and Total Tax, to boil this down to Invoice Detail Fact, there are two basic options that I can think of:

  1. Create Tax and Freight columns in your Invoice Detail fact and distribute amongst your items. This Kimball Tip suggests exactly that: http://www.kimballgroup.com/2001/07/01/design-tip-25-designing-dimensional-models-for-parent-child-applications/.

  2. An alternative approach which has worked well for me is to create two new members in your product dimension. One for tax and one for freight. Then add these two line items to the fact just like a normal product with appropriate values.

When you analyse by Invoice ID you get the total including Tax & Freight. When you analyse by individual product you don't get a misleading Freight or Tax figure.