Using sinfo
it shows 3 nodes are in drain
state,
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
all* up infinite 3 drain node[10,11,12]
Which command line should I use to undrain such nodes?
Found an approach, enter scontrol interpreter (in command line type scontrol
) and then
scontrol: update NodeName=node10 State=DOWN Reason="undraining"
scontrol: update NodeName=node10 State=RESUME
Then
scontrol: show node node10
displays amongst other info
State=IDLE
Update: some of these nodes got DRAIN state back; noticed their root partition was full after e.g. show node a10
which showed Reason=SlurmdSpoolDir is full
, thus in Ubuntu sudo apt-get clean
to remove /var/cache/apt
contents and also gzipped some /var/log
files.