My python program prepares inputs, runs an external FORTRAN code, and processes the outputs in a Windows HPC 2008 environment. It works great, unless the code executes the external program between 1042-1045 times (Usually the problem converges earlier). In these situations, I get an exception:
WindowsError: [Error 206] The filename or extension is too long
However, the path to the filename is not growing with time. It's just cleaning the directory and running again.
Here's the code:
inpF = open(inName)
outF = open(localOutName,'w')
p = subprocess.Popen(pathToExe,shell=False,stdin=inpF,stdout=outF,cwd=runPath)
stdout, stderr = p.communicate()
outF.close()
inpF.close()
pathToExe is a constant string pointing to a UNC location (e.g. \\server\shared\program.exe), stdin is an open file in read-only mode on a local drive, stdout is an open file in write mode on a local drive, and cwd is a local path on the C:\ drive. I have confirmed that none of the arguments to subprocess are longer than 80 characters, even though the limit is supposed to be 32,768, according to this somewhat related post.
What am I doing wrong? Somehow something is accumulating that only becomes a problem when I run over a thousand times.
UPDATE:
To test the "too many files open" hypothesis, I made a very small example that runs very quickly with a different executable. The main difference here is that the stdin and stdout are just empty files here, whereas in the previous case, they're both large files. In this case, the code runs just fine for 2000 runs, whereas the earlier fails at ~1042. So it's not just that there are that many files. Maybe there are too many large files open?
import subprocess
for i in range(nRuns):
if not (i % (nRuns/10.0)):
print('{0:.2}% complete'.format(i/float(nRuns)*100))
inpF=open('in.txt')
outF=open('out.txt','w')
p = subprocess.Popen('isotxsmerge.exe',shell=False,stdin=inpF,
stdout=outF,cwd='.')
stdout, stderr = p.communicate()
outF.close()
inpF.close()
hmmm....actually, I think the error message text is a red herring. I don't know for sure, but it seems likely to me that what's happening is that you're running out of file handles. From various sources, it seems that the canonical file handle limit is around 2048 files...which is curiously in the neighborhood of 2 x your 1042 subprocesses. I don't know the internals of the windows python interpreter, but my guess is that the handles aren't being garbage collected fast enough, even though you're closing the files. Again...this is just a guess...but perhaps it's another line of thinking that might lead you to something more conclusive and productive.
In the mean time, as a work around, you can use the old stand-by approach by having a governor process that spawns a process, that in turn spawns the subprocesses. The intermediate subprocess has a determined lifespan (say...no more than 1000 subprocesses that it spawns) before it dies. When the intermediate subprocess expires, the governor process starts a new one. This is a hack...and a clumsy one at that...but it does work. (IIRC, the apache web server used to have some kind of self-destruct limit on how many requests a sub-process could handle.)
Anyway...best of luck and happy coding.