Benefits of using Staging Database while designing Data Warehouse

Prateek Singh picture Prateek Singh · Jan 9, 2014 · Viewed 38k times · Source

I am in process of designing a Data Warehouse Architecture. While exploring various options to Extract data from Production and putting into Data Warehouse, I came across many articles which mainly suggested following two approaches -

  1. Production DB ----> Data Warehouse (Star Schema) ----> OLAP Cube
  2. Production DB ----> Staging Database ----> Data Warehouse (Star Schema) ----> OLAP Cube

I am still not sure which one is the better approach in terms of Performance and reducing processing load on Production database.

Which approach you find better while designing Data Warehouse ?

Answer

hashbrown picture hashbrown · Jan 9, 2014

Below points are taken from, DWBI Organization's article

Staging area may be required if you have any of the following scenarios:

  1. Delta Loading: Your data is read incrementally from the source and you need an intermediate storage where incremental set of your data can be stored temporarily for transformation purpose
  2. Transformation need: You need to perform data cleansing, validation etc. before consuming the data in the warehouse
  3. De-coupling: Your processing takes lot of time and you do not want to remain connected to your source system (presumably the source system is being constantly used by the actual business users) during the entire time of your processing and, hence, prefer to just read the data from source system in one-go, disconnect from the source and then continue processing the data at your "own side"
  4. Debugging purpose: You need not go back to your source all the time and you can troubleshoot issues (if any) from staging area alone
  5. Failure Recovery: Source system may be transitory and the state of the data may be changing. If you encounter any upstream failure, you may not be in a position to re-extract your data as source has changed by that time. Having a local copy helps

Performance and reduced processing may not be only considerations. Adding a staging may sometimes increase latency (i.e. time delay between occurrence of a business incidence and it's reporting). But I hope above points will help you to make a better judgement.