nvidia-smi process hangs and can't be killed with SIGKILL either

bio picture bio · Jan 5, 2017 · Viewed 10.2k times · Source

I'm on Ubuntu 14.04, CUDA toolkit 8, driver version 367.48.

When I give nvidia-smi command, it just hangs indefinitely. When I login again and try to kill that nvidia-smi process, with kill -9 <PID> for example, it just isn't killed. If I give another nvidia-smi command, I find both the processes running - of course when logging from another shell, because that gets stuck as before.

Can it be an issue related to the driver? It's not the latest, but still quite new..

Answer

lurscher picture lurscher · May 19, 2018

I solved this problem by doing at every boot

sudo nvidia-smi -pm 1

The above command enables persistence mode. This issue has been affecting nvidia drivers for over two years but they don't seem interested in fixing it. It seems to be related with a power management issue, after a bit of booting into the OS, if the nvidia-persistenced service has the no-persistence-mode option enabled, the GPU will save power, and the nvidia-smi command will hang waiting for something giving it control again on the device