Below is a question which supposedly was part of the official exam from Sun:
A Reader entity has a one-to-many, bidirectional relationship with a Book entity. Two Reader entities are persisted, each having two Book entities associated with them. For example, reader 1 has book a and book b, while reader 2 has book c and book d. Which query returns a Collection of fewer than four elements?
A. SELECT b.reader FROM Book b
B. SELECT r FROM Book b INNER JOIN b.reader r
C. SELECT r FROM Reader r INNER JOIN r.books b
D. SELECT r from Book b LEFT JOIN b.reader r LEFT JOIN FETCH r.books
Given answer is C, which I believe is incorrect. From what I understand, SQL with inner join of two tables will be generated by JPA provider. Therefore, in all cases we will get 4 records. I've run a test with one-to-many relation and duplicates were included.
Who is wrong, me or Sun?
Answer from Mike Keith, EJB 3.0 co-specification lead:
There are a couple of statements related to duplicates in the spec.
The JOIN FETCH is a variation of the JOIN, but it does state that similar JOIN semantics apply (except that more data is selected). The spec (section 4.4.5.3 of JPA v2.0) gives an example of duplicate Department rows being returned despite the fact that the Employee objects are not in the select clause.
The more direct reference is in the SELECT section (section 4.8 of JPA v2.0), where it clearly states
"If DISTINCT is not specified, duplicate values are not eliminated."
Many JPA providers do in fact remove the duplicates for a few reasons:
a) Convenience of the users because some users are not knowledgable enough in SQL and are not expecting them b) There is not typically a use case for requiring dups c) They may be added to a result set and if object identity is maintained the dups get eliminated automatically