Python subprocess Popen.communicate() equivalent to Popen.stdout.read()?

Christophe picture Christophe · Oct 19, 2012 · Viewed 32.6k times · Source

Very specific question (I hope): What are the differences between the following three codes?

(I expect it to be only that the first does not wait for the child process to be finished, while the second and third ones do. But I need to be sure this is the only difference...)

I also welcome other remarks/suggestions (though I'm already well aware of the shell=True dangers and cross-platform limitations)

Note that I already read Python subprocess interaction, why does my process work with Popen.communicate, but not Popen.stdout.read()? and that I do not want/need to interact with the program after.

Also note that I already read Alternatives to Python Popen.communicate() memory limitations? but that I didn't really get it...

Finally, note that I am aware that somewhere there is a risk of deadlock when one buffer is filled with one output using one method, but I got lost while looking for clear explanations on the Internet...

First code:

from subprocess import Popen, PIPE

def exe_f(command='ls -l', shell=True):
    """Function to execute a command and return stuff"""

    process = Popen(command, shell=shell, stdout=PIPE, stderr=PIPE)

    stdout = process.stdout.read()
    stderr = process.stderr.read()

    return process, stderr, stdout

Second code:

from subprocess import Popen, PIPE
from subprocess import communicate

def exe_f(command='ls -l', shell=True):
    """Function to execute a command and return stuff"""

    process = Popen(command, shell=shell, stdout=PIPE, stderr=PIPE)

    (stdout, stderr) = process.communicate()

    return process, stderr, stdout

Third code:

from subprocess import Popen, PIPE
from subprocess import wait

def exe_f(command='ls -l', shell=True):
    """Function to execute a command and return stuff"""

    process = Popen(command, shell=shell, stdout=PIPE, stderr=PIPE)

    code   = process.wait()
    stdout = process.stdout.read()
    stderr = process.stderr.read()

    return process, stderr, stdout

Thanks.

Answer

jdi picture jdi · Oct 19, 2012

If you look at the source for subprocess.communicate(), it shows a perfect example of the difference:

def communicate(self, input=None):
    ...
    # Optimization: If we are only using one pipe, or no pipe at
    # all, using select() or threads is unnecessary.
    if [self.stdin, self.stdout, self.stderr].count(None) >= 2:
        stdout = None
        stderr = None
        if self.stdin:
            if input:
                self.stdin.write(input)
            self.stdin.close()
        elif self.stdout:
            stdout = self.stdout.read()
            self.stdout.close()
        elif self.stderr:
            stderr = self.stderr.read()
            self.stderr.close()
        self.wait()
        return (stdout, stderr)

    return self._communicate(input)

You can see that communicate does make use of the read calls to stdout and stderr, and also calls wait(). It is just a matter of order of operations. In your case because you are using PIPE for both stdout and stderr, it goes into _communicate():

def _communicate(self, input):
    stdout = None # Return
    stderr = None # Return

    if self.stdout:
        stdout = []
        stdout_thread = threading.Thread(target=self._readerthread,
                                         args=(self.stdout, stdout))
        stdout_thread.setDaemon(True)
        stdout_thread.start()
    if self.stderr:
        stderr = []
        stderr_thread = threading.Thread(target=self._readerthread,
                                         args=(self.stderr, stderr))
        stderr_thread.setDaemon(True)
        stderr_thread.start()

    if self.stdin:
        if input is not None:
            self.stdin.write(input)
        self.stdin.close()

    if self.stdout:
        stdout_thread.join()
    if self.stderr:
        stderr_thread.join()

    # All data exchanged.  Translate lists into strings.
    if stdout is not None:
        stdout = stdout[0]
    if stderr is not None:
        stderr = stderr[0]

    # Translate newlines, if requested.  We cannot let the file
    # object do the translation: It is based on stdio, which is
    # impossible to combine with select (unless forcing no
    # buffering).
    if self.universal_newlines and hasattr(file, 'newlines'):
        if stdout:
            stdout = self._translate_newlines(stdout)
        if stderr:
            stderr = self._translate_newlines(stderr)

    self.wait()
    return (stdout, stderr)

This uses threads to read from multiple streams at once. Then it calls wait() at the end.

So to sum it up:

  1. This example reads from one stream at a time and does not wait for it to finish the process.
  2. This example reads from both streams at the same time via internal threads, and waits for it to finish the process.
  3. This example waits for the process to finish, and then reads one stream at a time. And as you mentioned has the potential to deadlock if there is too much written to the streams.

Also, you don't need these two import statements in your 2nd and 3rd examples:

from subprocess import communicate
from subprocess import wait

They are both methods of the Popen object.