I have the following Step:
return stepBuilderFactory.get("billStep")
.allowStartIfComplete(true)
.chunk(20000)
.reader(billReader)
.processor(billProcessor)
.faultTolerant()
.skipLimit(Integer.MAX_VALUE)
.skip(BillSkipException.class)
.listener(billReaderListener)
.listener(billSkipListener)
.writer(billRepoItemWriter)
.build();
Is my understanding correct, that fault tolerant means that when an exception is thrown in billProcessor, it will be processed in skip listener and then the next row/item will be processed in billProcessor?
I noticed upon adding in debug logs - that items/rows were "re-processed" when an exception is thrown in the processor. (probably because of faultTolerant config. But, what if I am processing 2 million records, and 300,000 of them were skipped - or throws a skip exception - isn't it an issue in performance if some of these were "re-processed")
And the big problem is - the next row/item is skipped. They were not processed in the processor at all.
If I remove the faultTolerant and SkipListener - and directly save the skipped records in the database (what skiplistener is doing) - it is working, but is this solution correct?
No job is perfect! Errors happen. You may receive bad data. You may forget one null check that causes a NullPointerException at the worst of times. How you handle errors using Spring Batch is our topic today. There are many scenarios where exceptions encountered while processing should not result in Step failure, but should be skipped instead.
Spring batch Skip technique With the skip technique you may specify certain exception types and a maximum number of skipped items, and whenever one of those skippable exceptions is thrown, the batch job doesn’t fail but skip the item and goes on with the next one. Only when the maximum number of skipped items is reached, the batch job will fail. For example, Spring Batch provides the ability to skip a record when a specified Exception is throw when there is an error reading a record from your input. This section will look at how to use this technique to skip records based upon specific Exceptions. There are two pieces involved in choosing when a record is skipped.
1. Exception Under what conditions to skip the record, specifically what exceptions you will ignore. When any error occurs during the reading process, Spring Batch throws an exception. In order to determine what to skip, you need to identify what exceptions to skip.
2. Skipped records How many input records you will allow the step to skip before considering the step execution failed. If you skip one or two records out of a million, not a big deal; however, skipping half a million out of a million is probably wrong. It’s your responsibility to determine the threshold.
(Spring Batch Exception Handling Example)
This entire processing happens at each individual item level, and not a chunk level. Hence, whenever we whenever the spring batch is not able to process an item at a single go. It tries to re-process/dig down into individual items to determine the exact item to skip. This is fine because with batch jobs we expect certain latency as they usually deal with scheduled big data jobs.