How to "undrain" slurm nodes in drain state

elm picture elm · Apr 9, 2015 · Viewed 48.5k times · Source

Using sinfo it shows 3 nodes are in drain state,

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
all*         up   infinite      3  drain node[10,11,12]

Which command line should I use to undrain such nodes?

Answer

elm picture elm · Apr 9, 2015

Found an approach, enter scontrol interpreter (in command line type scontrol) and then

scontrol: update NodeName=node10 State=DOWN Reason="undraining"
scontrol: update NodeName=node10 State=RESUME

Then

scontrol: show node node10

displays amongst other info

State=IDLE

Update: some of these nodes got DRAIN state back; noticed their root partition was full after e.g. show node a10 which showed Reason=SlurmdSpoolDir is full, thus in Ubuntu sudo apt-get clean to remove /var/cache/apt contents and also gzipped some /var/log files.