Spring Batch how to filter duplicated items before send it to ItemWriter

Question 1

Spring Batch how to filter duplicated items before send it to ItemWriter

spring batch-processing spring-batch

Aure77 · Dec 5, 2014 · Viewed 14.4k times · Source

Answer

Answer

Filtering is typically done with an ItemProcessor. If the ItemProcessor returns null, the item is filtered and not passed to the ItemWriter. Otherwise, it is. In your case, you could keep a list of previously seen users in the ItemProcessor. If the user hasn't been seen before, pass it on. If it has been seen before, return null. You can read more about filtering with an ItemProcessor in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#filiteringRecords

/**
* This implementation assumes that there is enough room in memory to store the duplicate
* Users.  Otherwise, you'd want to store them somewhere you can do a look-up on.
*/
public class UserFilterItemProcessor implements ItemProcessor<User, User> {

    // This assumes that User.equals() identifies the duplicates
    private Set<User> seenUsers = new HashSet<User>();

    public User process(User user) {
        if(seenUsers.contains(user)) {
            return null;
        }
        seenUsers.add(user);
        return user;

    }
}

Question 2

I read a flat file (for example a .csv file with 1 line per User, Ex: UserId;Data1;Date2).

But how to handle duplicated User item in the reader (where is no list of previus readed users...)

stepBuilderFactory.get("createUserStep1")
.<User, User>chunk(1000)
.reader(flatFileItemReader) // FlatFileItemReader
.writer(itemWriter) // For example JDBC Writer
.build();

Spring Batch how to filter duplicated items before send it to ItemWriter

Answer

Related questions