How can I extract columns from a fixed-width format in Perl?

Greg picture Greg · Sep 29, 2009 · Viewed 7.7k times · Source

I'm writing a Perl script to run through and grab various data elements such as:

1253592000
1253678400                 86400                 6183.000000
1253764800                 86400                 4486.000000 
1253851200  36.000000      86400                10669.000000
1253937600  0.000000       86400                 9126.000000
1254024000  0.000000       86400                 2930.000000
1254110400  0.000000       86400                 2895.000000
1254196800  0.000000                             8828.000000

I can grab each line of this text file no problem.

I have working regex to grab each of those fields. Once I have the line in a variable, i.e. $line - how can I grab each of those fields and place them into their own variables even though they have different delimiters?

Answer

FMc picture FMc · Sep 29, 2009

This example illustrates how to parse the line either with whitespace as the delimiter (split) or with a fixed-column layout (unpack). With unpack if you use upper-case (A10 etc), whitespace will be removed for you. Note: as brian d foy points out, the split approach does not work well for a situation with missing fields (for example, the second line of data), because the field position information will be lost; unpack is the way to go here, unless we are misunderstanding your data.

use strict;
use warnings;

while (my $line = <DATA>){
    chomp $line;
    my @fields_whitespace = split m'\s+', $line;
    my @fields_fixed = unpack('a10 a10 a12 a28', $line);
}

__DATA__
1253592000                                                  
1253678400                 86400                 6183.000000
1253764800                 86400                 4486.000000
1253851200 36.000000       86400                10669.000000
1253937600  0.000000       86400                 9126.000000
1254024000  0.000000       86400                 2930.000000
1254110400  0.000000       86400                 2895.000000
1254196800  0.000000                             8828.000000