Query results as a Stream with Hibernate 5.2

kmandalas picture kmandalas · May 12, 2017 · Viewed 8k times · Source

Since Hibernate 5.2, we are able to use the stream() method instead of scroll() if we want to fetch large amount of data.

However, when using scroll() with ScrollableResults we are able to a hook into the retrieval process and to free memory up by either evicting the object from the persistent context after processing it and/or clearing the entire session every now and then.

My questions:

  1. Now, if we use the stream() method, what happens behind the scenes?
  2. Is it possible to evict object from the persistent context?
  3. Is the session cleared periodically?
  4. How is optimal memory consumption achieved?
  5. Is is possible to use e.g. StatelessSession?
  6. Also, if we have set hibernate.jdbc.fetch_size to some number (e.g. 1000) at JPA properties, then how is this combined well with scrollable results?

Answer

wild_nothing picture wild_nothing · May 19, 2017

The following works for me:

DataSourceConfig.java

@Bean
public LocalSessionFactoryBean sessionFactory() {
    // Link your data source to your session factory
    ...
}

@Bean("hibernateTxManager")
public HibernateTransactionManager hibernateTxManager(@Qualifier("sessionFactory") SessionFactory sessionFactory) {
    // Link your session factory to your transaction manager
    ...
}

MyServiceImpl.java

@Service
@Transactional(propagation = Propagation.REQUIRES_NEW, transactionManager = "hibernateTxManager", readOnly = true)
public class MyServiceImpl implements MyService {

    @Autowired
    private MyRepo myRepo;
    ...
    Stream<MyEntity> stream = myRepo.getStream();
    // Do your streaming and CLOSE the steam afterwards
    ...

MyRepoImpl.java

@Repository
@Transactional(propagation = Propagation.MANDATORY, transactionManager = "hibernateTxManager", readOnly = true)
public class MyRepoImpl implements MyRepo {

    @Autowired
    private SessionFactory sessionFactory;

    @Autowired
    private MyDataSource myDataSource;

    public Stream<MyEntity> getStream() {

        return sessionFactory.openStatelessSession(DataSourceUtils.getConnection(myDataSource))
            .createNativeQuery("my_query", MyEntity.class)
            .setReadOnly(true)
            .setFetchSize(1000)
            .stream();
    }
    ...

Just remember, when you stream you really only need to be cautious of memory at the point of object materialisation. That is truly the only part of the operation susceptible to problems in memory. In my case I chunk the stream at 1000 objects at a time, serialise them with gson and send them to a JMS broker immediately. The garbage collector does the rest.

It's worth noting that Spring's transactional boundary awareness closes the connection to the dB at the end without needing to be explicitly told.