I am building an analytics database (I have a firm understanding of the data and the business objectives and only basic-to-moderate database skills).
I have come across some references to building similar warehouses which implement the concept of 'calendar tables'. This makes sense and is easily enough done. Most examples I see, however, are calendar tables that limit scope to 'day'. My data will need to be analyzed down to hour-level. Possibly minutes.
My question: would an implementation of calendar tables for hour/minute-level granularity be of value in terms of space-efficiency and query/sorting speed? If so, can you recommend a table structure and population method/example?
My primary data table will contain 20+ million rows of data at any given time and typical subsets for analysis are in the 1 to 5 million range. So, as you can see, that is a lot of timestamp fields.
In PostgreSQL
, you can generate calendar tables of arbitrary length and granularity on the fly:
SELECT CAST('2011-01-01' AS DATE) + (n || ' hour')::INTERVAL
FROM generate_series(0, 23) n
This does not require recursion (as with the other systems) and is a preferred method to generate the volatile resultsets.