I am implementing an Entity Attribute Value based persistence mechanism. All DB access is done via Hibernate. I have a table that contains paths for nodes, it is extremely simple, just an id, and a path (string) The paths would be small in number, around a few thousand.
The main table has millions of rows, and rather than repeating the paths, I've normalized the paths to their own table. The following is the behaviour I want, when inserting into main table
1) Check if the path exists in paths table (query via entity manager, using path value as parameter)
2) if it does not exist, insert, and get id (persist via entity manager)
3) put id as foreign key value to main table row, and insert this into main table.
This is going to happen thousands of times for a set of domain objects, which correspond to lots of rows in main table and some other tables. So the steps above are repeated using a single transaction like this:
EntityTransaction t = entityManager.getTransaction();
t.begin();
//perform steps given above, check, and then persist etc..
t.commit();
When I perform step 2, it introduces a huge performance drop to the total operation. It is begging for caching, because after a while that table will be at most 10-20k entries with very rare new inserts. I've tried to do this with Hibernate, and lost almost 2 days.
I'm using Hibernate 4.1, with JPA annotations and ECache. I've tried to enable query caching, even using the same query object throughout the inserts, as shown below:
Query call = entityManager.createQuery("select pt from NodePath pt " +
"where pt.path = :pathStr)");
call.setHint("org.hibernate.cacheable", true);
call.setParameter("pathStr", pPath);
List<NodePath> paths = call.getResultList();
if(paths.size() > 1)
throw new Exception("path table should have unique paths");
else if (paths.size() == 1){
NodePath path = paths.get(0);
return path.getId();
}
else {//paths null or has zero size
NodePath newPath = new NodePath();
newPath.setPath(pPath);
entityManager.persist(newPath);
return newPath.getId();
}
The NodePath entity is annotated as follows:
@Entity
@Cacheable
@Cache(usage = CacheConcurrencyStrategy.NONSTRICT_READ_WRITE)
@Table(name = "node_path", schema = "public")
public class NodePath implements java.io.Serializable {
The query cache is being used, as far as I can see from the statistics, but no use for second level cache is reported:
queries executed to database=1
query cache puts=1
query cache hits=689
query cache misses=1
....
second level cache puts=0
second level cache hits=0
second level cache misses=0
entities loaded=1
....
A simple, hand written hashtable as a cache, works as expected, cutting down total time drastically. I guess I'm failing to trigger Hibernate's caching due to nature of my operations.
How do I use hibernate's second level cache with this setup? For the record, this is my persistence xml:
http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd" version="2.0">
<provider>org.hibernate.ejb.HibernatePersistence</provider>
<class>...</class>
<exclude-unlisted-classes>true</exclude-unlisted-classes>
<shared-cache-mode>ENABLE_SELECTIVE</shared-cache-mode>
<properties>
<property name="hibernate.connection.driver_class" value="org.postgresql.Driver" />
<property name="hibernate.connection.password" value="zyx" />
<property name="hibernate.connection.url" value="jdbc:postgresql://192.168.0.194:5432/testdbforml" />
<property name="hibernate.connection.username" value="postgres"/>
<property name="hibernate.dialect" value="org.hibernate.dialect.PostgreSQLDialect"/>
<property name="hibernate.search.autoregister_listeners" value="false"/>
<property name="hibernate.jdbc.batch_size" value="200"/>
<property name="hibernate.connection.autocommit" value="false"/>
<property name="hibernate.generate_statistics" value="true"/>
<property name="hibernate.cache.use_structured_entries" value="true"/>
<property name="hibernate.cache.use_second_level_cache" value="true"/>
<property name="hibernate.cache.use_query_cache" value="true"/>
<property name="hibernate.cache.region.factory_class" value="org.hibernate.cache.ehcache.SingletonEhCacheRegionFactory"/>
</properties>
Ok, I found it. My problem was that, cached query was keeping only Ids of query results in the cache, and it was (probably) going back to db to get the actual values, rather than getting them from the second level cache.
The problem is of course, the query did not put those values to second level cache, since they were not selected by primary id. So the solution is to use a method that will put values to second level cache, and with hibernate 4.1, I've manage to do this with natural id. Here is the function that either inserts or returns the value from cache, just in case it helps anybody else:
private UUID persistPath(String pPath) throws Exception{
org.hibernate.Session session = (Session) entityManager.getDelegate();
NodePath np = (NodePath) session.byNaturalId(NodePath.class).using("path", pPath).load();
if(np != null)
return np.getId();
else {//no such path entry, so let's create one
NodePath newPath = new NodePath();
newPath.setPath(pPath);
entityManager.persist(newPath);
return newPath.getId();
}
}