What are important points when designing a (binary) file format?

oliver picture oliver · Nov 27, 2008 · Viewed 10.5k times · Source

When designing a file format for recording binary data, what attributes would you think the format should have? So far, I've come up with the following important points:

  • have some "magic bytes" at the beginning, to be able to recognize the files (in my specific case, this should also help to distinguish the files from "legacy" files)
  • have a file version number at the beginning, so that the file format can be changed later without breaking compatibility
  • specify the endianness and size of all data items; or: include some space to describe endianness/size of data (I would tend towards the former)
  • possibly reserve some space for further per-file attributes that might be necessary in the future?

What else would be useful to make the format more future-proof and minimize headache in the future?

Answer

Stepan Stolyarov picture Stepan Stolyarov · Nov 27, 2008

Take a look at the PNG spec. This format has some very good rationale behind it.

Also, decide what's important for your future format: compactness, compatibility, allowing to embed other formats (different compression algorithms) inside it. Another interesting example would be the Google's protocol buffers, where size of the transferred data is the king.

As for endianness, I'd suggest you to pick one option and stick with it, not allowing different byte orders. Otherwise, reading and writing libraries will only get more complex and slower.