Is there something like Redis DB, but not limited with RAM size?

Andrey picture Andrey · Aug 26, 2013 · Viewed 17.5k times · Source

I'm looking for a database matching these criteria:

  • May be non-persistent;
  • Almost all keys of DB need to be updated once in 3-6 hours (100M+ keys with total size of 100Gb)
  • Ability to quickly select data by key (or Primary Key)
  • This needs to be a DBMS (so LevelDB doesn't fit)
  • When data is written, DB cluster must be able to serve queries (single nodes can be blocked though)
  • Not in-memory – our dataset will exceed the RAM limits
  • Horizontal scaling and replication
  • Support full rewrite of all data (MongoDB doesn't clear space after deleting data)
  • C# and Java support

Here's my process of working with such database: We've got an analytics cluster that produces 100M records (50GB) of data every 4-6 hours. The data is a "key - array[20]". This data needs to be distributed to users through a front-end system with a rate of 1-10k requests per second. In average, only ~15% of the data is requested, the rest of it will be rewritten in 4-6 hours when the next data set is generated.

What i tried:

  1. MongoDB. Datastorage overhead, high defragmentation costs.
  2. Redis. Looks perfect, but it's limited with RAM and our data exceeds it.

So the question is: is there anything like Redis, but not limited with RAM size?

Answer

FGRibreau picture FGRibreau · Aug 27, 2013

Yes, there are two alternatives to Redis that are not limited by RAM size while remaining compatible with Redis protocol:

Ardb (C++), replication(Master-Slave/Master-Master): https://github.com/yinqiwen/ardb

A redis-protocol compatible persistent storage server, support LevelDB/KyotoCabinet/LMDB as storage engine.

Edis (Erlang): https://github.com/cbd/edis

Edis is a protocol-compatible Server replacement for Redis, written in Erlang. Edis's goal is to be a drop-in replacement for Redis when persistence is more important than holding the dataset in-memory. Edis (currently) uses Google's leveldb as a backend.

And for completeness here is another data-structures database:

Hyperdex (Strings, Integers, Floats, Lists, Sets, Maps): http://hyperdex.org/doc/latest/DataTypes/#chap:data-types

HyperDex is:

  • Fast: HyperDex has lower latency, higher throughput, and lower variance than other key-value stores.
  • Scalable: HyperDex scales as more machines are added to the system.
  • Consistent: HyperDex guarantees linearizability for key-based operations. Thus, a read always returns the latest value inserted into the system. Not just “eventually,” but immediately and always.
  • Fault Tolerant: HyperDex automatically replicates data on multiple machines so that concurrent failures, up to an application-determined limit, will not cause data loss. Searchable:
  • HyperDex enables efficient lookups of secondary data attributes.
  • Easy-to-Use: HyperDex provides APIs for a variety of scripting and native languages.
  • Self-Maintaining: A HyperDex is self-maintaining and requires little user maintenance.