Using CRC32 algorithm to hash string at compile-time

Vinz243 picture Vinz243 · Feb 23, 2015 · Viewed 8.9k times · Source

Basically I want in my code to be able to do this:

 Engine.getById(WSID('some-id'));

Which should get transformed by

 Engine.getById('1a61bc96');

just before being compiled into asm. So at compile-time.

This is my try

constexpr int WSID(const char* str) {
    boost::crc_32_type result;
    result.process_bytes(str,sizeof(str));
    return result.checksum();
}

But I get this when trying to compile with MSVC 18 (CTP November 2013)

error C3249: illegal statement or sub-expression for 'constexpr' function

How can I get the WSID function, using this way or any, as long as it is done during compile time?

Tried this: Compile time string hashing

 warning C4592: 'crc32': 'constexpr' call evaluation failed; function will be called at run-time

EDIT:

I first heard about this technique in Game Engine Architecture by Jason Gregory. I contacted the author who obligingly answer to me this :

What we do is to pass our source code through a custom little pre-processor that searches for text of the form SID('xxxxxx') and converts whatever is between the single quotes into its hashed equivalent as a hex literal (0xNNNNNNNN). [...]

You could conceivably do it via a macro and/or some template metaprogramming, too, although as you say it's tricky to get the compiler to do this kind of work for you. It's not impossible, but writing a custom tool is easier and much more flexible. [...]

Note also that we chose single quotes for SID('xxxx') literals. This was done so that we'd get some reasonable syntax highlighting in our code editors, yet if something went wrong and some un-preprocessed code ever made it thru to the compiler, it would throw a syntax error because single quotes are normally reserved for single-character literals.

Note also that it's crucial to have your little pre-processing tool cache the strings in a database of some sort, so that the original strings can be looked up given the hash code. When you are debugging your code and you inspect a StringId variable, the debugger will normally show you the rather unintelligible hash code. But with a SID database, you can write a plug-in that converts these hash codes back to their string equivalents. That way, you'll see SID('foo') in your watch window, not 0x75AE3080 [...]. Also, the game should be able to load this same database, so that it can print strings instead of hex hash codes on the screen for debugging purposes [...].

But while preprocess has some main advantages, it means that I have to prepare some kind of output system of modified files (those will be stored elsewhere, and then we need to tell MSVC). So it might complicate the compiling task. Is there a way to preprocess file with python for instance without headaches? But this is not the question, and I'm still interested about using compile-time function (about cache I could use an ID index)

Answer

tux3 picture tux3 · Mar 2, 2015

Here is a solution that works entirely at compile time, but may also be used at runtime. It is a mix of constexpr, templates and macros. You may want to change some of the names or put them in a separate file since they are quite short.

Note that I reused code from this answer for the CRC table generation and I based myself off of code from this page for the implementation.

I have not tested it on MSVC since I don't currently have it installed in my Windows VM, but I believe it should work, or at least be made to work with trivial changes.

Here is the code, you may use the crc32 function directly, or the WSID function that more closely matches your question :

#include <cstring>
#include <cstdint>
#include <iostream>

// Generate CRC lookup table
template <unsigned c, int k = 8>
struct f : f<((c & 1) ? 0xedb88320 : 0) ^ (c >> 1), k - 1> {};
template <unsigned c> struct f<c, 0>{enum {value = c};};

#define A(x) B(x) B(x + 128)
#define B(x) C(x) C(x +  64)
#define C(x) D(x) D(x +  32)
#define D(x) E(x) E(x +  16)
#define E(x) F(x) F(x +   8)
#define F(x) G(x) G(x +   4)
#define G(x) H(x) H(x +   2)
#define H(x) I(x) I(x +   1)
#define I(x) f<x>::value ,

constexpr unsigned crc_table[] = { A(0) };

// Constexpr implementation and helpers
constexpr uint32_t crc32_impl(const uint8_t* p, size_t len, uint32_t crc) {
    return len ?
            crc32_impl(p+1,len-1,(crc>>8)^crc_table[(crc&0xFF)^*p])
            : crc;
}

constexpr uint32_t crc32(const uint8_t* data, size_t length) {
    return ~crc32_impl(data, length, ~0);
}

constexpr size_t strlen_c(const char* str) {
    return *str ? 1+strlen_c(str+1) : 0;
}

constexpr int WSID(const char* str) {
    return crc32((uint8_t*)str, strlen_c(str));
}

// Example usage
using namespace std;

int main() {
    cout << "The CRC32 is: " << hex << WSID("some-id") << endl;
}

The first part takes care of generating the table of constants, while crc32_impl is a standard CRC32 implementation converted to a recursive style that works with a C++11 constexpr. Then crc32 and WSID are just simple wrappers for convenience.