First, I'm still newbie in tensorflow. I'm using v0.9 and trying to use the 2 GPUs installed in the machine we have. So, here is what's happening:
training data
script on the machine, it works only on one of the 2 GPUs. It takes the first one by default gpu:0/
.training data
script to run on the second GPU (after doing the changes needed i.e. with tf.device..
) while keeping the first process running on the first GPU, tensorflow kills the first process and use only the second GPU to run the second process. So it seems only one process at a time is allowed by tensorflow?What I need is: to be able to launch two separate training data
scripts for 2 differents models on 2 different GPUs installed on the same machine. Am I missing something in this case? Is this the expected behavior? Should I go through distributed tensorflow on a local machine to do so?
Tensorflow tries to allocate some space on every GPU it sees.
To work around this, make Tensorflow see a single (and different) GPU for every script: to do that, you have to use the environment variable CUDA_VISIBLE_DEVICES
in this way:
CUDA_VISIBLE_DEVICES=0 python script_one.py
CUDA_VISIBLE_DEVICES=1 python script_two.py
In both script_one.py
and script_two.py
use tf.device("/gpu:0")
to place the device on the only GPU that it sees.