Errno::ENOMEM: Cannot allocate memory - cat

Atith picture Atith · Feb 26, 2013 · Viewed 33.3k times · Source

I have a job running on production which process xml files. xml files counts around 4k and of size 8 to 9 GB all together.

After processing we get CSV files as output. I've a cat command which will merge all CSV files to a single file I'm getting:

Errno::ENOMEM: Cannot allocate memory

on cat (Backtick) command.

Below are few details:

  • System Memory - 4 GB
  • Swap - 2 GB
  • Ruby : 1.9.3p286

Files are processed using nokogiri and saxbuilder-0.0.8.

Here, there is a block of code which will process 4,000 XML files and output is saved in CSV (1 per xml) (sorry, I'm not suppose to share it b'coz of company policy).

Below is the code which will merge the output files to a single file

Dir["#{processing_directory}/*.csv"].sort_by {|file| [file.count("/"), file]}.each {|file|
            `cat #{file} >> #{final_output_file}`
}

I've taken memory consumption snapshots during processing.It consumes almost all part of the memory, but, it won't fail. It always fails on cat command.

I guess, on backtick it tries to fork a new process which doesn't get enough memory so it fails.

Please let me know your opinion and alternative to this.

Answer

Intrepidd picture Intrepidd · Feb 26, 2013

So it seems that your system is running pretty low on memory and spawning a shell + calling cat is too much for the few memory left.

If you don't mind loosing some speed, you can merge the files in ruby, with small buffers. This avoids spawning a shell, and you can control the buffer size.

This is untested but you get the idea :

buffer_size = 4096
output_file = File.open(final_output_file, 'w')

Dir["#{processing_directory}/*.csv"].sort_by {|file| [file.count("/"), file]}.each do |file|
  f = File.open(file)
  while buffer = f.read(buffer_size)
    output_file.write(buffer)
  end
  f.close
end