I read a flat file (for example a .csv file with 1 line per User, Ex: UserId;Data1;Date2).
But how to handle duplicated User item in the reader (where is no list of previus readed users...)
stepBuilderFactory.get("createUserStep1")
.<User, User>chunk(1000)
.reader(flatFileItemReader) // FlatFileItemReader
.writer(itemWriter) // For example JDBC Writer
.build();
Filtering is typically done with an ItemProcessor
. If the ItemProcessor
returns null, the item is filtered and not passed to the ItemWriter
. Otherwise, it is. In your case, you could keep a list of previously seen users in the ItemProcessor
. If the user hasn't been seen before, pass it on. If it has been seen before, return null. You can read more about filtering with an ItemProcessor
in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#filiteringRecords
/**
* This implementation assumes that there is enough room in memory to store the duplicate
* Users. Otherwise, you'd want to store them somewhere you can do a look-up on.
*/
public class UserFilterItemProcessor implements ItemProcessor<User, User> {
// This assumes that User.equals() identifies the duplicates
private Set<User> seenUsers = new HashSet<User>();
public User process(User user) {
if(seenUsers.contains(user)) {
return null;
}
seenUsers.add(user);
return user;
}
}