For executing periodical tasks, I looked at Timer
and ScheduledThreadPoolExecutor
(with a single thread) and decided to use the latter, because in the reference for Executors.newSingleThreadScheduledExecutor()
, it says:
Note however that if this single thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.
My plan was to use this as a safeguard against uncaught exceptions in a watchdog piece of code that I want to monitor other operations. I wanted to make sure and wrote the test below, which promptly failed. It seems I was making wrong assumptions, or is something wrong about my test?
Here's the code:
@Test
public void testTimer() {
final AtomicInteger cTries = new AtomicInteger(0);
final AtomicInteger cSuccesses = new AtomicInteger(0);
TimerTask task = new TimerTask() {
@Override
public void run()
{
cTries.incrementAndGet();
if (true) {
throw new RuntimeException();
}
cSuccesses.incrementAndGet();
}
};
/*
Timer t = new Timer();
t.scheduleAtFixedRate(task, 0, 500);
*/
ScheduledExecutorService exe = Executors.newSingleThreadScheduledExecutor();
exe.scheduleAtFixedRate(task, 0, 500, TimeUnit.MILLISECONDS);
synchronized (this) {
try {
wait(3000);
} catch (InterruptedException e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
}
}
exe.shutdown();
/*
t.purge();
*/
Assert.assertEquals(cSuccesses.get(), 0);
Assert.assertTrue(cTries.get() > 1, String.format("%d is not greater than 1. :(", cTries.get()));
}
Once a repeating task has thrown an uncaught exception it is assumed to have died or be in an error state. It is a bit of a gotcha that it also fails silently unless you examine the Future to get the Error/Exception.
You have to catch Exceptions if you don't want to kill the repeating task.
As matt b points out in the comment above,
it would be problematic for framework code like this to assume it can safely restart a failed job - the fact that it failed with an exception means that the data might have been left in any sort of state, and potentially it would be unsafe to restart the job.