xvfb-run unreliable when multiple instances invoked in parallel

user3236315 picture user3236315 · May 19, 2015 · Viewed 7.5k times · Source

Can you help me, why I get sometimes (50:50):

webkit_server.NoX11Error: Cannot connect to X. You can try running with xvfb-run.

When I start the script in parallel as:

xvfb-run -a python script.py 

You can reproduce this yourself like so:

for ((i=0; i<10; i++)); do
  xvfb-run -a xterm &
done

Of the 10 instances of xterm this starts, 9 of them will typically fail, exiting with the message Xvfb failed to start.

Answer

Charles Duffy picture Charles Duffy · May 19, 2015

Looking at xvfb-run 1.0, it operates as follows:

# Find a free server number by looking at .X*-lock files in /tmp.
find_free_servernum() {
    # Sadly, the "local" keyword is not POSIX.  Leave the next line commented in
    # the hope Debian Policy eventually changes to allow it in /bin/sh scripts
    # anyway.
    #local i

    i=$SERVERNUM
    while [ -f /tmp/.X$i-lock ]; do
        i=$(($i + 1))
    done
    echo $i
}

This is very bad practice: If two copies of find_free_servernum run at the same time, neither will be aware of the other, so they both can decide that the same number is available, even though only one of them will be able to use it.

So, to fix this, let's write our own code to find a free display number, instead of assuming that xvfb-run -a will work reliably:

#!/bin/bash

# allow settings to be updated via environment
: "${xvfb_lockdir:=$HOME/.xvfb-locks}"
: "${xvfb_display_min:=99}"
: "${xvfb_display_max:=599}"

# assuming only one user will use this, let's put the locks in our own home directory
# avoids vulnerability to symlink attacks.
mkdir -p -- "$xvfb_lockdir" || exit

i=$xvfb_display_min     # minimum display number
while (( i < xvfb_display_max )); do
  if [ -f "/tmp/.X$i-lock" ]; then                # still avoid an obvious open display
    (( ++i )); continue
  fi
  exec 5>"$xvfb_lockdir/$i" || continue           # open a lockfile
  if flock -x -n 5; then                          # try to lock it
    exec xvfb-run --server-num="$i" "$@" || exit  # if locked, run xvfb-run
  fi
  (( i++ ))
done

If you save this script as xvfb-run-safe, you can then invoke:

xvfb-run-safe python script.py 

...and not worry about race conditions so long as no other users on your system are also running xvfb.


This can be tested like so:

for ((i=0; i<10; i++)); do xvfb-wrap-safe xchat & done

...in which case all 10 instances correctly start up and run in the background, as opposed to:

for ((i=0; i<10; i++)); do xvfb-run -a xchat & done

...where, depending on your system's timing, nine out of ten will (typically) fail.