Efficiently counting the number of lines of a text file. (200mb+)

Abs picture Abs · Jan 29, 2010 · Viewed 156.9k times · Source

I have just found out that my script gives me a fatal error:

Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 440 bytes) in C:\process_txt.php on line 109

That line is this:

$lines = count(file($path)) - 1;

So I think it is having difficulty loading the file into memeory and counting the number of lines, is there a more efficient way I can do this without having memory issues?

The text files that I need to count the number of lines for range from 2MB to 500MB. Maybe a Gig sometimes.

Thanks all for any help.

Answer

Dominic Rodger picture Dominic Rodger · Jan 29, 2010

This will use less memory, since it doesn't load the whole file into memory:

$file="largefile.txt";
$linecount = 0;
$handle = fopen($file, "r");
while(!feof($handle)){
  $line = fgets($handle);
  $linecount++;
}

fclose($handle);

echo $linecount;

fgets loads a single line into memory (if the second argument $length is omitted it will keep reading from the stream until it reaches the end of the line, which is what we want). This is still unlikely to be as quick as using something other than PHP, if you care about wall time as well as memory usage.

The only danger with this is if any lines are particularly long (what if you encounter a 2GB file without line breaks?). In which case you're better off doing slurping it in in chunks, and counting end-of-line characters:

$file="largefile.txt";
$linecount = 0;
$handle = fopen($file, "r");
while(!feof($handle)){
  $line = fgets($handle, 4096);
  $linecount = $linecount + substr_count($line, PHP_EOL);
}

fclose($handle);

echo $linecount;