I need to be able to store some data in a custom binary file format. I've never designed my own file format before. It needs to be a friendly format for traveling between the C#, Java and Ruby/Perl/Python worlds.
To start with the file will consist of records. A GUID field and a JSON/YAML/XML packet field. I'm not sure what to use as delimiters. A comma, tab or newline kind of thing seems too fragile. What does Excel do? or the pre-XML OpenOffice formats? Should you use ASCII chars 0 or 1. Not sure where to begin. Any articles or books on the topic?
This file format may expand later to include a "header section".
Note: To start with I'll be working in .NET, but I'd like the format to be easily portable.
UPDATE:
The processing of the "packets" can be slow, but navigation within the file format cannot. So I think XML is off the table.
How about looking at using "protocol buffers"? Designed as an efficient, portable, version-tolerant general purpose binary format, it gives you C++, Java and Python in the google library, and C#, Perl, Ruby and others in the community ports?
Note that Guid doesn't have a specific data type, but you can shim it as a message with (essentially) a byte[]
.
Normally for .NET work, I'd recommend protobuf-net (but as the author, I'm somewhat biased) - however, if you intend to use other languages later you might do better (long term) using Jon's dotnet-protobufs; that'll give you a familiar API accross the platforms (where-as protobuf-net uses .NET idioms).