Split binary data into byte array in Perl

sdaau picture sdaau · Oct 31, 2012 · Viewed 9.5k times · Source

I basically wanted to convert a binary string to an array/list of bytes (so as to allow indexing, and avoid using substr, because its syntax complicates things for me), and I came up with the following MWE:

#!/usr/bin/env perl

use warnings;
use strict;

# Use open ':raw';      # Unknown PerlIO layer class ':raw'
use open IO => ':raw';

binmode(STDIN);
binmode(STDOUT);

# Create original 8-bit byte array/list
my @atmp = (0x80, 0x23, 0x14, 0x0d, 0x0a, 0x00, 0x00, 0x80, 0x43, 0x00, 0x00);

# Make a copy of portion
my @atmp2 = (0) x 2;
@atmp2[0..1] = @atmp[7..8];

# Print output
print "Copied atmp2 contents as hex: " . join(", ", unpack("H2"x2, pack("C"x2,@atmp2))) . "\n";
print "Copied atmp2 as ushort (16bit) int: " . unpack("S", pack("C"x2, @atmp2));
# doublecheck value by routing through printf with format specifier:
printf(" [%d]\n", unpack("S", pack("C"x2, @atmp2)));


# Now, the same data as string:
my $indata = "\x80\x23\x14\x0d\x0a\x00\x00\x80\x43\x00\x00";

# Create byte array (by converting string $indata to array/list with `split`)
my @btmp = split('',$indata);
print "lastindex: " . $#btmp . "\n";

# Make a copy of portion
my @btmp2 = (0) x 2;
@btmp2[0..1] = @btmp[7..8];

# Print output
print "Copied btmp2 contents as hex: " . join(", ", unpack("H2"x2, pack("C"x2,@btmp2))) . "\n";
print "Copied btmp2 as ushort (16bit) int: " . unpack("S", pack("C"x2, @btmp2));
# doublecheck value by routing through printf with format specifier:
printf(" [%d]\n", unpack("S", pack("C"x2, @btmp2)));

Running this code results with:

$ perl test.pl
Copied atmp2 contents as hex: 80, 43
Copied atmp2 as ushort (16bit) int: 17280 [17280]
lastindex: 10
Argument "M-\0" isn't numeric in pack at test.pl line 38.
Argument "C" isn't numeric in pack at test.pl line 38.
Copied btmp2 contents as hex: 00, 00
Copied btmp2 as ushort (16bit) int: 0 [0]

How do I get the second part (btmp2) to behave the same as the first part (atmp2)?

Answer

sdaau picture sdaau · Oct 31, 2012

It turns out, when using split, it does indeed create an array with the same bytes as in the original string; however, it also seems to mark the resulting array somehow as "textual", so further processing fails with "Argument not numeric".

The answer is simply to replace the split line with one that uses unpack, instead:

- my @btmp = split('',$indata);
+ my @btmp = unpack('C*',$indata);

... and after that, all works as expected (both printouts are identical). Interestingly, in both cases, "lastindex" (for the array derived from a string) will be shown as 10 (which made me think something may be wrong with the binmode, which is why all those statements are there in the code).