One-To-Many relationship gets duplicate objects without using "distinct". Why?

herti picture herti · Sep 12, 2013 · Viewed 45.4k times · Source

I have 2 classes in a one-to-many relationship and a HQL query that is a bit strange. Even if I have read some questions already posted, it does not seem clear to me.

Class Department{
   @OneToMany(fetch=FetchType.EAGER, mappedBy="department")
   Set<Employee> employees;
}
Class Employee{
   @ManyToOne
   @JoinColumn(name="id_department")
   Department department;
}

When I use the following query I get duplicates Department objects:

session.createQuery("select dep from Department as dep left join dep.employees");

Thus, I have to use distinct:

session.createQuery("select distinct dep from Department as dep left join dep.employees");

Is this behaviour an expected one? I consider this unusual, as comparing it with SQL.

Answer

gerrytan picture gerrytan · Sep 12, 2013

This question is thoroughly explained on Hibernate FAQ:

First, you need to understand SQL and how OUTER JOINs work in SQL. If you do not fully understand and comprehend outer joins in SQL, do not continue reading this FAQ item but consult a SQL manual or tutorial. Otherwise you will not understand the following explanation and you will complain about this behavior on the Hibernate forum. Typical examples that might return duplicate references of the same Order object:

List result = session.createCriteria(Order.class)  
                        .setFetchMode("lineItems", FetchMode.JOIN)  
                        .list();

<class name="Order">           
    <set name="lineItems" fetch="join">
    ...
</class>

List result = session.createCriteria(Order.class)  
                        .list();  

List result = session.createQuery("select o from Order o left join fetch o.lineItems").list();  

All of these examples produce the same SQL statement:

SELECT o.*, l.* from ORDER o LEFT OUTER JOIN LINE_ITEMS l ON o.ID = l.ORDER_ID   

Want to know why the duplicates are there? Look at the SQL resultset, Hibernate does not hide these duplicates on the left side of the outer joined result but returns all the duplicates of the driving table. If you have 5 orders in the database, and each order has 3 line items, the resultset will be 15 rows. The Java result list of these queries will have 15 elements, all of type Order. Only 5 Order instances will be created by Hibernate, but duplicates of the SQL resultset are preserved as duplicate references to these 5 instances. If you do not understand this last sentence, you need to read up on Java and the difference between an instance on the Java heap and a reference to such an instance. (Why a left outer join? If you'd have an additional order with no line items, the result set would be 16 rows with NULL filling up the right side, where the line item data is for other order. You want orders even if they don't have line items, right? If not, use an inner join fetch in your HQL).
Hibernate does not filter out these duplicate references by default. Some people (not you) actually want this. How can you filter them out? Like this:

Collection result = new LinkedHashSet( session.create*(...).list() );  

A LinkedHashSet filteres out duplicate references (it's a set) and it preserves insertion order (order of elements in your result). That was too easy, so you can do it in many different and more difficult ways:

List result = session.createCriteria(Order.class)  
                        .setFetchMode("lineItems", FetchMode.JOIN)  
                        .setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY)  
                        .list();  


<class name="Order">  
    ...  
    <set name="lineItems" fetch="join">  

List result = session.createCriteria(Order.class)  
                        .setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY)  
                        .list();  

List result = session.createQuery("select o from Order o left join fetch o.lineItems")  
                      .setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY) // Yes, really!  
                      .list();  

List result = session.createQuery("select distinct o from Order o left join fetch o.lineItems").list();       

The last one is special. It looks like you are using the SQL DISTINCT keyword here. Of course, this is not SQL, this is HQL. This distinct is just a shortcut for the result transformer, in this case. Yes, in other cases an HQL distinct will translate straight into a SQL DISTINCT. Not in this case: you can not filter out duplicates at the SQL level, the very nature of a product/join forbids this - you want the duplicates or you don't get all the data you need. All of this filtering of duplicates happens in-memory, when the resultset is marshalled into objects. It should be also obvious why resultset row-based "limit" operations, such as setFirstResult(5) and setMaxResults(10) do not work with these kind of eager fetch queries. If you limit the resultset to a certain number of rows, you cut off data randomly. One day Hibernate might be smart enough to know that if you call setFirstResult() or setMaxResults() it should not use a join, but a second SQL SELECT. Try it, your version of Hibernate might already be smart enough. If not, write two queries, one for limiting stuff, the other for eager fetching. Do you want to know why the example with the Criteria query did not ignore the fetch="join" setting in the mapping but HQL didn't care? Read the next FAQ item.