Efficient data structure for a leaderboard, i.e., a list of records (name, points) - Efficient Search(name), Search(rank) and Update(points)

Sachin Joseph picture Sachin Joseph · Aug 9, 2014 · Viewed 7.8k times · Source

Please suggest a data structure for representing a list of records in memory. Each record is made up of:

  • User Name
  • Points
  • Rank (based on Points) - optional field - can be either stored in the record or can be computed dynamically

The data structure should support implementation of the following operations efficiently:

  1. Insert(record) - might change ranks of existing records
  2. Delete(record) - might change ranks of existing records
  3. GetRecord(name) - Probably a hash table will do.
  4. GetRecord(rank)
  5. Update(points) - might change ranks of existing records

My main problem is efficient implementation of GetRecord(rank), because ranks can change frequently.

I guess an in-memory DBMS would be a good solution, but please don't suggest it; please suggest a data structure.

Answer

chepner picture chepner · Aug 11, 2014

Basically, you'll just want a pair of balanced search trees, which will allow O(lg n) insertion, deletion, and getRecord operations. The trick is that instead of storing the actual data in the trees, you'll store pointers to a set of record objects, where each record object will contain 5 fields:

  1. user name
  2. point value
  3. rank
  4. pointer back to the node in the name tree that references the object
  5. pointer back to the node in the point tree that references the object.

The name tree is only modified when new records are added and when records are deleted. The point tree is modified for insertions and deletions, but also for updates, where the appropriate record is found, has its point-tree pointer removed, its point count updated, then a new pointer added to the point-tree.

As you mention, you can use a hash table instead for the name tree if you like. The key here is that you simply maintain separate sorted indexes into a set of otherwise unordered records that themselves contain pointers to their index nodes.


The point tree will be some variation on an order statistic tree, which rather than being a specific data structure, is an umbrella term for a binary search tree whose operations are modified to maintain an invariant which makes the requested rank-related operations more efficient than walking the tree. The details of how the invariants are maintained depend on the underlying balanced search tree (red-black tree, avl tree, etc) used.