"grep" offset of ascii string from binary file

mgilson picture mgilson · Jan 3, 2013 · Viewed 27.1k times · Source

I'm generating binary data files that are simply a series of records concatenated together. Each record consists of a (binary) header followed by binary data. Within the binary header is an ascii string 80 characters long. Somewhere along the way, my process of writing the files got a little messed up and I'm trying to debug this problem by inspecting how long each record actually is.

This seems extremely related, but I don't understand perl, so I haven't been able to get the accepted answer there to work. The other answer points to bgrep which I've compiled, but it wants me to feed it a hex string and I'd rather just have a tool where I can give it the ascii string and it will find it in the binary data, print the string and the byte offset where it was found.

In other words, I'm looking for some tool which acts like this:

tool foobar filename

or

tool foobar < filename

and its output is something like this:

foobar:10
foobar:410
foobar:810
foobar:1210
...

e.g. the string which matched and a byte offset in the file where the match started. In this example case, I can infer that each record is 400 bytes long.

Other constraints:

  • ability to search by regex is cool, but I don't need it for this problem
  • My binary files are big (3.5Gb), so I'd like to avoid reading the whole file into memory if possible.

Answer

Hari Menon picture Hari Menon · Jan 3, 2013
grep --byte-offset --only-matching --text foobar filename

The --byte-offset option prints the offset of each matching line.

The --only-matching option makes it print offset for each matching instance instead of each matching line.

The --text option makes grep treat the binary file as a text file.

You can shorten it to:

grep -oba foobar filename

It works in the GNU version of grep, which comes with linux by default. It won't work in BSD grep (which comes with Mac by default).