Regular expression to only match X number of characters from end of line

ŹV - picture ŹV - · Mar 15, 2012 · Viewed 33.7k times · Source

Below you'll see a small excerpt of matches from the string 'octeon' in a 32b memory dump from a proprietary routing device. As you can see it contains some adjusted ASCII extending to 16 characters from the end of the line, then four 32-bit words (8 characters each, of course), then the address offset.

000b27a0: 41646a75 7374206f 6374656f 6e5f6970    Adjust octeon_ip
000b2850: 73740a00 00000000 6f637465 6f6e5f72    st......octeon_r
000b2870: 5f73697a 65000000 6f637465 6f6e5f72    _size...octeon_r
000b2990: 6164696e 672e0a00 6f637465 6f6e5f72    ading...octeon_r
000b29b0: 785f7369 7a650000 6f637465 6f6e5f72    x_size..octeon_r
000b3050: 780a0000 00000000 6f637465 6f6e5f70    x.......octeon_p
000b3650: 6564204f 6374656f 6e206d6f 64656c0a    ed Octeon model.
000bade0: 20307825 71780a00 6f637465 6f6e5f6c     0x%qx..octeon_l
000bafd0: 696e6720 4f637465 6f6e2045 78656375    ing Octeon Execu
000bd710: 6564204f 6374656f 6e204d6f 64656c21    ed Octeon Model!
000bd950: 4f435445 4f4e2070 61737320 3120646f    OCTEON pass 1 do
000bda20: 6564206f 6374656f 6e206d6f 64656c3a    ed octeon model:

While that data contains some useful information, tragically, the operating system (HiveOS) makes no attempt to allocate memory contiguously or to coalesce disparate heaps (and why should they?), so the vast majority of memory is a barren yet-to-be-malloc'd heap.

0004d6b0: 00000000 00000000 00000000 00000000    ................
0004d6c0: 00000000 00000000 00000000 00000000    ................
0004d6d0: 00000000 00000000 00000000 00000000    ................
0004d6e0: 00000000 00000000 00000000 00000000    ................
0004d6f0: 00000000 00000000 00000000 00000000    ................
0004d700: 00000000 00000000 00000000 00000000    ................
0004d710: 00000000 00000000 00000000 00000000    ................
0004d720: 00000000 00000000 00000000 00000000    ................
0004d730: 00000000 00000000 00000000 00000000    ................
0004d740: 00000000 00000000 00000000 00000000    ................
0004d750: 00000000 00000000 00000000 00000000    ................

I'd like to quickly and efficiently pull out strings of a certain size matching some arbitrary regular expression pattern ([a-zA-z] comes to mind) You might naturally think that running the perennial object dump examination favorite 'strings' would yield a result, but the md util is a cruel mistress -- due to the presence of ascii coded hexadecimal banks & addresses, it identifies every line as containing a 'string'.

Sure, we all know there exists a trivial scripting solution (for line in hexdump: f.write(line[-16:]) + grep '[A-z]' f).

However, sometimes I'm struck with the feeling that I should come to understand these dastardly oppressive, yet misunderstood regular expressions better, rather than slinking back to my easy to use newfangled programmin' languages. I really feel I can't start growing a real Unix neckbeard until I've completely replaced my entire development toolchain life with various stream editor and Awk script's regular expressions.

How does one match [a-zA-z] within a certain numbers of characters from the end of line (In my case, 16) -- it seems like a pretty pithy construction but all combination of +, ? {16} and otherwise that made sense to me in the past few minutes have promptly failed.

Answer

Bohemian picture Bohemian · Mar 17, 2012

Use the "non-matching" switch -v:

grep -v \.{16}$

This will strip out all lines ending with 16 dots.

Here's the man documentation for it:

-v, --invert-match
Invert the sense of matching, to select non-matching lines.