Reconfigure and reboot a Hudson/Jenkins slave as part of a build

Jason Swager picture Jason Swager · Apr 4, 2011 · Viewed 21.9k times · Source

I have a Jenkins (Hudson) server setup that runs tests on a variety of slave machines. What I want to do is reconfigure the slave (using remote APIs), reboot the slave so that he changes take effect, then continue with the rest of the test. There are two hurdles that I've encountered so far:

  1. Once a Jenkins job begins to run on the slave, the slave cannot go down or break the network connection to the server otherwise Jenkins immediately fails the test. Normally, I would say this is completely desirable behavior. But in this case, I would like for Jenkins to accept the disruption until the slave comes back online and Jenkins can reconnect to it - or the slave reconnects to Jenkins.
  2. In a job that has been attached to the slave, I need to run some build tasks on the Jenkins master - not on the slave.

Is this possible? So far, I haven't found a way to do this using Jenkins or any of its plugins.

EDIT - Further Explanation I really, really like the Jenkins slave architecture. Combined with the plugins already available, it makes it very easy to get jobs to a slave, run, and the results pulled back. And the ability to pick any matching slave allows for automatic job/test distribution.

In our situation, we use virtualized (VMware) slave machines. It was easy enough to write a script that would cause Jenkins to use VMware PowerCLI to start the VM up when it needed to run on a slave, then ship the job to it and pull the results back. All good.

EXCEPT Part of the setup of each test is to slightly reconfigure the virtual machine in some fashion. Disable UAC, logon as a different user, have a different driver installed, etc - each of these changes requires that the test VM/slave be rebooted before the changes take affect. Although I can write slave on-demand scripts (Launch Method=Launch slave via execution of command on the master) that handle this reconfig and restart, it has to be done BEFORE the job is run. That's where the problem occurs - I cannot configure the slave that early because the type of configuration changes are dependent on the job being run, which occurs only after the slave is started.

Possible Solutions
1) Use multiple slave instances on a single VM. This wouldn't work - several of the configurations are mutually exclusive, but Jenkins doesn't know that. So it would try to start one slave configuration for one job, another slave for a different job - and both slaves would be on the same VM. Locks on the jobs don't prevent this since slave starting isn't part of the job.

2) (Optimal) A build step that allows a job to know that it's slave connection MIGHT be disrupted. The build step may have to include some options so that Jenkins knows how to reconnect the slave (will the slave reconnect automatically, will Jenkins have to run a script, will simple SSH suffice). The build step would handle the disconnect of the slave, ignore the usually job-failing disconnect, then perform the reconnect. Once the slave is back up and running, the next build step can occur. Perhaps a timeout to fail the job if the slave isn't reconnectable in a certain amount of time.

** Current Solution ** - less than optimal
Right now, I can't use the slave function of Jenkins. Instead, I use a series of build steps - run on the master - that use Windows and PowerShell scripts to power on the VM, make the configurations, and restart it. The VM has a SSH server running on it and I use that to upload test files to the test VM, then remote execute them. Then download the results back to Jenkins for handling by the job. This solution is functional - but a lot more work than the typical Jenkins slave approach. Also, the scripts are targeted towards a single VM; I can't easily use a pool of slaves.

Answer

Bill Agee picture Bill Agee · May 12, 2011

Not sure if this will work for you, but you might try making the Jenkins agent node programmatically tell the master node that it's offline.

I had a situation where I needed to make a Jenkins job that performs these steps (all while running on the master node):

  • revert the Jenkins agent node VM to a powered-off snapshot
  • tell the master that the agent node is disconnected (since the master does not seem to automatically notice the agent is down, whenever I revert or hard power off my VMs)
  • power the agent node VM back on
  • as a "Post-build action", launch a separate job restricted to run on the agent node VM

I perform the agent disconnect step with a curl POST request, but there might be a cleaner way to do it:

curl -d "offlineMessage=&json=%7B%22offlineMessage%22%3A+%22%22%7D&Submit=Yes" http://JENKINS_HOST/computer/THE_NODE_TO_DISCONNECT/doDisconnect

Then when I boot the agent node, the agent launches and automatically connects, and the master notices the agent is back online (and will then send it jobs).

I was also able to toggle a node's availability on and off with this command (using 'toggleOffline' instead of 'doDisconnect'):

curl -d "offlineMessage=back_in_a_moment&json=%7B%22offlineMessage%22%3A+%22back_in_a_moment%22%7D&Submit=Mark+this+node+temporarily+offline" http://JENKINS_HOST/computer/NODE_TO_DISCONNECT/toggleOffline

(Running the same command again puts the node status back to normal.)

The above may not apply to you since it sounds like you want to do everything from one jenkins job running on the agent node. And I'm not sure what happens if an agent node disconnects or marks itself offline in the middle of running a job. :)

Still, you might poke around in this Remote Access API doc a bit to see what else is possible with this kind of approach.