Are document databases good for storing large amounts of Stock Tick data?

dvkwong picture dvkwong · Jul 8, 2010 · Viewed 10.7k times · Source

I was thinking of using a database like mongodb or ravendb to store a lot of stock tick data and wanted to know if this would be viable compared to a standard relational such as Sql Server.

The data would not really be relational and would be a couple of huge tables. I was also thinking that I could sum/min/max rows of data by minute/hour/day/week/month etc for even faster calculations.

Example data: 500 symbols * 60 min * 60sec * 300 days... (per record we store: date, open, high,low,close, volume, openint - all decimal/float)

So what do you guys think?

Answer

Dan Dascalescu picture Dan Dascalescu · Sep 15, 2016

Since when this question was asked in 2010, several database engines were released or have developed features that specifically handle time series such as stock tick data:

With MongoDB or other document-oriented databases, if you target performance, the advices is to contort your schema to organize ticks in an object keyed by seconds (or an object of minutes, each minute being another object with 60 seconds). With a specialized time series database, you can query data simply with

SELECT open, close FROM market_data
WHERE symbol = 'AAPL' AND time > '2016-09-14' AND time < '2016-09-21'

I was also thinking that I could sum/min/max rows of data by minute/hour/day/week/month etc for even faster calculations.

With InfluxDB, this is very straightforward. Here's how to get the daily minimums and maximums:

SELECT MIN("close"), MAX("close") FROM "market_data" WHERE WHERE symbol = 'AAPL'
GROUP BY time(1d)

You can group by time intervals which can be in microseconds (u), seconds (s), minutes (m), hours (h), days (d) or weeks (w).

TL;DR

Time-series databases are better choices than document-oriented databases for storing and querying large amounts of stock tick data.