Jenkins slave went offline during build

RMK picture RMK · Nov 29, 2016 · Viewed 7.8k times · Source

Jenkins slave going offline during build. How can I fix this , I saw lot of related questions in SO and Jenkins issues but no one gave solution.

My configuration:

Jenkins version 1.651.1, Zuul version 2.1.1.dev393 with one Jenkins master(Ubuntu), 2 slaves(Ubuntu) each has 16GB of RAM Running builds in parallel.

Jenkins master, devstack and both nodepool slaves are in same IP range.

I'm facing an issue when one of the slave completes its build then the java process in both the slaves is getting killed so the other slave going offline.

I found this issue by listing out the processes running in the slaves and observed that java process is getting killed simultaneous in both slaves when one of the slave completed its build and the other slave is still running the build.

Previously I had this issue and that was resolved by switching to Oracle's JDK from Open JDK. Now slaves are using oracle java 1.8.0_111 but now we getting same issue with Oracle-java8 also

Build logs:

01:42:07 Slave went offline during the build
01:42:07 ERROR: Connection was broken: java.io.IOException: Unexpected termination of the channel
01:42:07    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
01:42:07 Caused by: java.io.EOFException
01:42:07    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2351)
01:42:07    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2820)
01:42:07    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
01:42:07    at java.io.ObjectInputStream.<init>(ObjectInputStream.java:302)
01:42:07    at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
01:42:07    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(    AbstractSynchronousByteArrayCommandTransport.java:34)
01:42:07    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
01:42:07 
01:42:07 Build step 'Execute shell' marked build as failure 

Answer

dildeepak picture dildeepak · Dec 7, 2016

The slaves goes offline, either because

  1. The jobs running onto it are consuming more RAM than it is having or no memory left.

-If this is the case, try to have less number of executors in slaves or have more CPU/RAM in nodes.

  1. Slave cleanup process might be running or some orphan process might be running in back , which is causing the connection break.

-Stop the cleanup process or kill the orphan process, which is consuming the memory.

  1. SSH keys might got changed between master and slaves.

-Need to send the ssh keys to slaves via scp again and need to touch up once again.

Please try once and also read the below articles for more help.