Hash function for 3d integer coordinates

ali picture ali · Sep 3, 2014 · Viewed 7k times · Source

Having a 3D uniform grid, in order to save memory in large models the empty cells(those that don't overlap with any object) don't need to be saved. I am using Dictionary in c# for this purpose. Although the performance already has decreased yet this is still better than having exception at the time of creating the 3D grid. Now my problem is to find a fast hash function that maps a 3d integer coordinate of the grid to a unique number.

I already have tried ((x * 73856093 + y * 19349669 + z * 83492791))% n which doesn't always generate a unique number.

Answer

MvG picture MvG · Sep 5, 2014

On the one hand you write your aim as “save memory“, while on the other hand you ask for “a fast hash function that maps a 3d integer coordinate of the grid to a unique number”. These two are not very compatible.

Either you want to guarantee O(1) access. In that case you have to prevent hash collisions and must map input to unique numbers. But in that case you also need as many cells in your hash map as there are possible inputs. So you would gain no memory saving over a simple N×N×N array.

Or – and this is far more likely – you only want hash collisions to be rare. Then you can have a hash map which is about twice the number of actually stored objects. But in this case, you don't have to completely avoid hash collisions, you only have to make them sufficiently rare.

Choosing a good hash function depends a lot on the likely patterns of your input data. If input is fairly random, and know the size of your hash map, you should aim for uniform distribution. If objects are more likely located in adjacent blocks, then you want to make sure that small changes in coordinates are unlikely to result in a collision. This is the point where it helps to not make your factors primes, so that a small change in one direction is less likely to collide by one in another direction.

If in doubt, you can always test things: Given three prime numbers (e.g. for the hash 137x+149y+163z) and some real-world setups (i.e. used coordinates and resulting hash map size), you can simply apply the hash to all coordinates, mod down to the hash map size and count the number of unique values. Do the same for various triples and choose the one which maximizes that number. But I doubt that level of optimization is really worth the effort.