Jenkins slave going offline during build. How can I fix this , I saw lot of related questions in SO and Jenkins issues but no one gave solution.
My configuration:
Jenkins version 1.651.1, Zuul version 2.1.1.dev393 with one Jenkins master(Ubuntu), 2 slaves(Ubuntu) each has 16GB of RAM Running builds in parallel.
Jenkins master, devstack and both nodepool slaves are in same IP range.
I'm facing an issue when one of the slave completes its build then the java process in both the slaves is getting killed so the other slave going offline.
I found this issue by listing out the processes running in the slaves and observed that java process is getting killed simultaneous in both slaves when one of the slave completed its build and the other slave is still running the build.
Previously I had this issue and that was resolved by switching to Oracle's JDK from Open JDK. Now slaves are using oracle java 1.8.0_111 but now we getting same issue with Oracle-java8 also
Build logs:
01:42:07 Slave went offline during the build
01:42:07 ERROR: Connection was broken: java.io.IOException: Unexpected termination of the channel
01:42:07 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
01:42:07 Caused by: java.io.EOFException
01:42:07 at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2351)
01:42:07 at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2820)
01:42:07 at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
01:42:07 at java.io.ObjectInputStream.<init>(ObjectInputStream.java:302)
01:42:07 at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
01:42:07 at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read( AbstractSynchronousByteArrayCommandTransport.java:34)
01:42:07 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
01:42:07
01:42:07 Build step 'Execute shell' marked build as failure
The slaves goes offline, either because
-If this is the case, try to have less number of executors in slaves or have more CPU/RAM in nodes.
-Stop the cleanup process or kill the orphan process, which is consuming the memory.
-Need to send the ssh keys to slaves via scp again and need to touch up once again.
Please try once and also read the below articles for more help.