Faster alternative to nested loops?

benpage picture benpage · Apr 22, 2015 · Viewed 13.7k times · Source

I have a need to create a list of combinations of numbers. The numbers are quite small so I can use byte rather than int. However it requires many nested loops in order to get every possible combination. I'm wondering if there's a more efficient manner to do what I'm after. Code so far is:

var data = new List<byte[]>();
for (byte a = 0; a < 2; a++)
for (byte b = 0; b < 3; b++)
for (byte c = 0; c < 4; c++)
for (byte d = 0; d < 3; d++)
for (byte e = 0; e < 4; e++)
for (byte f = 0; f < 3; f++)
for (byte g = 0; g < 3; g++)
for (byte h = 0; h < 4; h++)
for (byte i = 0; i < 2; i++)
for (byte j = 0; j < 4; j++)
for (byte k = 0; k < 4; k++)
for (byte l = 0; l < 3; l++)
for (byte m = 0; m < 4; m++)
{
    data.Add(new [] {a, b, c, d, e, f, g, h, i, j, k, l, m});
}

I was considering using something like a BitArray but I'm not sure how I could incorporate it.

Any recommendations would be greatly appreciated. Alternatively, perhaps this is the fastest way of doing what I want?

EDIT Couple of quick points (and apologies I didn't put these in the original post):

  • The numbers and the order of them (2, 3, 4, 3, 4, 3, 3 etc) are very important, so using a solution such as Generating Permutations using LINQ won't help because the maximums in each 'column' are different
  • I'm not a mathematician, so I apologise if I'm not using the technical terms like 'permutations' and 'combinations' correctly :)
  • I do need to populate all of these combinations at once - I can't just grab one or another based on an index
  • Using byte is faster than using int, I guarantee it. It's also a lot better on memory usage to have 67m+ arrays of bytes rather than ints
  • My ultimate goal here is to look for a faster alternative to nested loops.
  • I considered using parallel programming, but due to the iterative nature of what I'm trying to achieve, I couldn't find a way to do it successfully (even with ConcurrentBag) - however I'm happy to be proved wrong :)

CONCLUSION

Caramiriel has provided a good micro-optimisation which shaves some time off the loops, so I've marked that answer as correct. Eric also mentioned that it is faster to pre-allocate the List. But, at this stage it seems that the nested loops are in fact the fastest possible way of doing this (depressing, I know!).

If you want to try exactly what I was trying to benchmark with StopWatch, go with 13 loops counting up to 4 in each loop - that makes about 67m+ lines in the list. On my machine (i5-3320M 2.6GHz) it takes about 2.2s to do the optimised version.

Answer

Caramiriel picture Caramiriel · Apr 22, 2015

As a reminder: you probably do not need this kind of code while developing your own solution. This can and should only used in very specific situations. Readability is often more important than speed.

You can use the properties of a struct and allocate the structure in advance. I cut off some levels in the sample below, but I'm sure you'll be able to figure out the specifics. Runs about 5-6 times faster than the original (release mode).

The block:

struct ByteBlock
{
    public byte A;
    public byte B;
    public byte C;
    public byte D;
    public byte E;
}

The loop:

var data = new ByteBlock[2*3*4*3*4];
var counter = 0;

var bytes = new ByteBlock();

for (byte a = 0; a < 2; a++)
{
    bytes.A = a;
    for (byte b = 0; b < 3; b++)
    {
        bytes.B = b;
        for (byte c = 0; c < 4; c++)
        {
            bytes.C = c;
            for (byte d = 0; d < 3; d++)
            {
                bytes.D = d;
                for (byte e = 0; e < 4; e++)
                {
                    bytes.E = e;
                    data[counter++] = bytes;
                }
            }
        }
    }
}

It's faster because it doesn't allocate a new list every time you add it to the list. Also since it's creating this list, it needs a reference to every other value (a,b,c,d,e). You can assume each value is only modified once inside the loop, so we can optimize it to do so (data locality).

Also read the comments for side-effects.

Edited the answer to use an T[] instead of a List<T>.