Why is subprocess.run output different from shell output of same command?

user2346536 picture user2346536 · Jun 9, 2016 · Viewed 23k times · Source

I am using subprocess.run() for some automated testing. Mostly to automate doing:

dummy.exe < file.txt > foo.txt
diff file.txt foo.txt

If you execute the above redirection in a shell, the two files are always identical. But whenever file.txt is too long, the below Python code does not return the correct result.

This is the Python code:

import subprocess
import sys


def main(argv):

    exe_path = r'dummy.exe'
    file_path = r'file.txt'

    with open(file_path, 'r') as test_file:
        stdin = test_file.read().strip()
        p = subprocess.run([exe_path], input=stdin, stdout=subprocess.PIPE, universal_newlines=True)
        out = p.stdout.strip()
        err = p.stderr
        if stdin == out:
            print('OK')
        else:
            print('failed: ' + out)

if __name__ == "__main__":
    main(sys.argv[1:])

Here is the C++ code in dummy.cc:

#include <iostream>


int main()
{
    int size, count, a, b;
    std::cin >> size;
    std::cin >> count;

    std::cout << size << " " << count << std::endl;


    for (int i = 0; i < count; ++i)
    {
        std::cin >> a >> b;
        std::cout << a << " " << b << std::endl;
    }
}

file.txt can be anything like this:

1 100000
0 417
0 842
0 919
...

The second integer on the first line is the number of lines following, hence here file.txt will be 100,001 lines long.

Question: Am I misusing subprocess.run() ?

Edit

My exact Python code after comment (newlines,rb) is taken into account:

import subprocess
import sys
import os


def main(argv):

    base_dir = os.path.dirname(__file__)
    exe_path = os.path.join(base_dir, 'dummy.exe')
    file_path = os.path.join(base_dir, 'infile.txt')
    out_path = os.path.join(base_dir, 'outfile.txt')

    with open(file_path, 'rb') as test_file:
        stdin = test_file.read().strip()
        p = subprocess.run([exe_path], input=stdin, stdout=subprocess.PIPE)
        out = p.stdout.strip()
        if stdin == out:
            print('OK')
        else:
            with open(out_path, "wb") as text_file:
                text_file.write(out)

if __name__ == "__main__":
    main(sys.argv[1:])

Here is the first diff:

enter image description here

Here is the input file: https://drive.google.com/open?id=0B--mU_EsNUGTR3VKaktvQVNtLTQ

Answer

jfs picture jfs · Jun 10, 2016

To reproduce, the shell command:

subprocess.run("dummy.exe < file.txt > foo.txt", shell=True, check=True)

without the shell in Python:

with open('file.txt', 'rb', 0) as input_file, \
     open('foo.txt', 'wb', 0) as output_file:
    subprocess.run(["dummy.exe"], stdin=input_file, stdout=output_file, check=True)

It works with arbitrary large files.

You could use subprocess.check_call() in this case (available since Python 2), instead of subprocess.run() that is available only in Python 3.5+.

Works very well thanks. But then why was the original failing ? Pipe buffer size as in Kevin Answer ?

It has nothing to do with OS pipe buffers. The warning from the subprocess docs that @Kevin J. Chase cites is unrelated to subprocess.run(). You should care about OS pipe buffers only if you use process = Popen() and manually read()/write() via multiple pipe streams (process.stdin/.stdout/.stderr).

It turns out that the observed behavior is due to Windows bug in the Universal CRT. Here's the same issue that is reproduced without Python: Why would redirection work where piping fails?

As said in the bug description, to workaround it:

  • "use a binary pipe and do text mode CRLF => LF translation manually on the reader side" or use ReadFile() directly instead of std::cin
  • or wait for Windows 10 update this summer (where the bug should be fixed)
  • or use a different C++ compiler e.g., there is no issue if you use g++ on Windows

The bug affects only text pipes i.e., the code that uses <> should be fine (stdin=input_file, stdout=output_file should still work or it is some other bug).