RegEx for extracting a value from Open3.popen3 stdout

Astaar picture Astaar · Jun 12, 2013 · Viewed 7.9k times · Source

How do I get the output of an external command and extract values from it?

I have something like this:

stdin, stdout, stderr, wait_thr = Open3.popen3("#{path}/foobar", configfile)

if /exit 0/ =~ wait_thr.value.to_s
    runlog.puts("Foobar exited normally.\n")
    puts "Test completed."
    someoutputvalue = stdout.read("TX.*\s+(\d+)\s+")
    puts "Output value: " + someoutputvalue
end

I'm not using the right method on stdout since Ruby tells me it can't convert String into Integer.

So for instance, if the output is

"TX So and so:     28"

I would like to get only "28". I validated that the regex above matches what I need to match, I'm only wondering how to store that extracted value in a variable.

What is the right way of doing this? I can't find anywhere in the documentation the methods available for stdout. I'm using stout.read from Ruby 1.9.3.

Answer

the Tin Man picture the Tin Man · Jun 12, 2013

All the information needed is in the Popen3 documentation, but you have to read it all and look at the examples pretty carefully. You can also glean useful information from the Process docs too.

Maybe this will 'splain it better:

require 'open3'

captured_stdout = ''
captured_stderr = ''
exit_status = Open3.popen3(ENV, 'date') {|stdin, stdout, stderr, wait_thr|
  pid = wait_thr.pid # pid of the started process.
  stdin.close
  captured_stdout = stdout.read
  captured_stderr = stderr.read
  wait_thr.value # Process::Status object returned.
}

puts "STDOUT: " + captured_stdout
puts "STDERR: " + captured_stderr
puts "EXIT STATUS: " + (exit_status.success? ? 'succeeded' : 'failed')

Running that outputs:

STDOUT: Wed Jun 12 07:07:12 MST 2013
STDERR:
EXIT STATUS: succeeded

Things to note:

  • You often have to close the stdin stream. If the called application expects input on STDIN it will hang until it sees the stream close, then will continue its processing.
  • stdin, stdout, stderr are IO handles, so you have to read the IO class documentation to find out what methods are available.
  • You have to output to stdin using puts, print or write, and read or gets from stdout and stderr.
  • exit_status isn't a string, it's an instance of the Process::Status class. You can mess with trying to parse from its to_s version, but don't. Instead use the accessors to see what it returned.
  • I passed in the ENV hash, so the child program had access to the entire environment the parent saw. It's not necessary to do that; Instead you can create a reduced environment for the child if you don't want it to have access to everything, or you can mess with its view of the environment by changing values.
  • The code stdout.read("TX.*\s+(\d+)\s+") posted in the question is, um... nonsense. I have no idea where you got that as nothing like that is documented in Ruby's IO class for IO#read or IO.read.

It's easier to use capture3 if you don't need to write to STDIN of the called code:

require 'open3'

stdout, stderr, exit_status = Open3.capture3('date')

puts "STDOUT: " + stdout
puts "STDERR: " + stderr
puts "EXIT STATUS: " + (exit_status.success? ? 'succeeded' : 'failed')

Which outputs:

STDOUT: Wed Jun 12 07:23:23 MST 2013
STDERR:
EXIT STATUS: succeeded

Extracting a value from a string using a regular expression is trivial, and well covered by the Regexp documentation. Starting from the last code example:

stdout[/^\w+ (\w+ \d+) .+ (\d+)$/]
puts "Today is: " + [$1, $2].join(' ')

Which outputs:

Today is: Jun 12 2013

That's using the String.[] method which is extremely flexible.

An alternate is using "named captures":

/^\w+ (?<mon_day>\w+ \d+) .+ (?<year>\d+)$/ =~ stdout
puts "Today is: #{ mon_day } #{ year }"

which outputs the same thing. The downside to named captures is they're slower for what I consider a minor bit of convenience.


"TX So and so: 28"[/\d+$/]
=> "28"