Neatest way to remove linebreaks in Perl

Christoffer picture Christoffer · May 19, 2009 · Viewed 111.8k times · Source

I'm maintaining a script that can get its input from various sources, and works on it per line. Depending on the actual source used, linebreaks might be Unix-style, Windows-style or even, for some aggregated input, mixed(!).

When reading from a file it goes something like this:

@lines = <IN>;
process(\@lines);

...

sub process {
    @lines = shift;
    foreach my $line (@{$lines}) {
        chomp $line;
        #Handle line by line
    }
}

So, what I need to do is replace the chomp with something that removes either Unix-style or Windows-style linebreaks. I'm coming up with way too many ways of solving this, one of the usual drawbacks of Perl :)

What's your opinion on the neatest way to chomp off generic linebreaks? What would be the most efficient?

Edit: A small clarification - the method 'process' gets a list of lines from somewhere, not nessecarily read from a file. Each line might have

  • No trailing linebreaks
  • Unix-style linebreaks
  • Windows-style linebreaks
  • Just Carriage-Return (when original data has Windows-style linebreaks and is read with $/ = '\n')
  • An aggregated set where lines have different styles

Answer

Christoffer picture Christoffer · May 19, 2009

After digging a bit through the perlre docs a bit, I'll present my best suggestion so far that seems to work pretty good. Perl 5.10 added the \R character class as a generalized linebreak:

$line =~ s/\R//g;

It's the same as:

(?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])

I'll keep this question open a while yet, just to see if there's more nifty ways waiting to be suggested.