I'm parsing a CSV file in which each line look something as below.
10998,4499,SLC27A5,Q9Y2P5,GO:0000166,GO:0032403,GO:0005524,GO:0016874,GO:0047747,GO:0004467,GO:0015245,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
There seems to be trailing commas at the end of each line.
I want to get the first term, in this case "10998" and get the number of GO terms related to it. So my output in this case should be,
10998,7
But instead it shows 299. I realized overall there are 303 commas in each line. And I'm not able to figure out an easy way to remove trailing commas. Can anyone help me solve this issue?
Thanks!
use strict;
use warnings;
open my $IN, '<', 'test.csv' or die "can't find file: $!";
open(CSV, ">GO_MF_counts_Genes.csv") or die "Error!! Cannot create the file: $!\n";
my @genes = ();
my $mf;
foreach my $line (<$IN>) {
chomp $line;
my @array = split(/,/, $line);
my @GO = splice(@array, 4);
my $GO = join(',', @GO);
$mf = count($GO);
print CSV "$array[0],$mf\n";
}
sub count {
my $go = shift @_;
my $count = my @go = split(/,/, $go);
return $count;
}
I'd use juanrpozo's solution for counting but if you still want to go your way, then remove the commas with regex substitution.
$line =~ s/,+$//;