How can I populate my Fact table?

Jnr picture Jnr · Oct 19, 2015 · Viewed 9.1k times · Source

I am not a seasoned BI developer so I need help populating my fact table. Firstly, I have populated all my Dimensions from my production database (I'm not using a staging database or tables) using the appropriate SSIS components.

DimParent, DimStudent, DimManager, and DimFacilitator use the natural key as the primary key. The rest of the dimensions use a surrogate key as the primary. The reason for using the natural keys is because I have the same database model for my production(OLTP) database over multiple different schemas (which act as my different Campus locations). DW Diagram

My measureable data is still in my production database and I can't seem to figure out how to populate my fact table. Production OLTP

I was thinking of using a large query with a join but it might get too complex regarding the way I populated my DimAssessmentType by using the query:

select PK_Assessment, [Description] 
from Auckland_Park.Assessment 
union 
select 3, 'International'

Answer

Nick.McDermaid picture Nick.McDermaid · Oct 19, 2015

Don't be inconsistent. Use surrogate keys for everything. Then no matter what happens (i.e. a campus comes online that does not follow this rule), you can account for it. Being inconsistent is just making work for yourself. Do the design right now. It's a hell of a job reloading a dimension and fact after you've got three years data in your fact.

Anyway. The way I populate a fact is:

  1. Load the facts into a staging table.
  2. The staging table has additional columns which contain your surrogate keys
  3. Run an update statement on your staging table that fills in the surrogate keys
  4. Pick an appropriate window in your fact. Delete and reload that window

It sounds like you might want to do an "inline lookup" in SSIS instead to find SK's. That's fine but it does make it difficult to troubleshoot. Also the SSIS lookup component doesn't scale well (i.e. works for few rows and is very very slow for many rows). Also it doesn't do SCD's very well.

Your statement " I can't seem to figure out..." is very vague. Follow the four steps above and tell me which one you can't figure out.

One issue might be that you are not preserving source system keys in your dim... so you can't look up the new surrogate keys based on source system keys.