I am running queries against an Oracle 10g with JDBC (using the latest drivers and UCP as DataSource) in order to retrieve CLOBs (avg. 20k characters). However the performance seems to be pretty bad: the batch retrieval of 100 LOBs takes 4s in average. The operation is also neither I/O nor CPU nor network bound judging from my observations.
My test setup looks like this:
PoolDataSource dataSource = PoolDataSourceFactory.getPoolDataSource();
dataSource.setConnectionFactoryClassName("...");
dataSource.setConnectionPoolName("...");
dataSource.setURL("...");
dataSource.setUser("...");
dataSource.setPassword("...");
dataSource.setConnectionProperty("defaultRowPrefetch", "1000");
dataSource.setConnectionProperty("defaultLobPrefetchSize", "500000");
final LobHandler handler = new OracleLobHandler();
JdbcTemplate j = new JdbcTemplate(dataSource);
j.query("SELECT bigClob FROM ...",
new RowCallbackHandler() {
public void processRow(final ResultSet rs) throws SQLException {
String result = handler.getClobAsString(rs, "bigClob");
}
});
}
I experimented with the fetch sizes but to no avail. Am I doing something wrong? Is there a way to speed up CLOB retrieval when using JDBC?
The total size of the result set is in the ten thousands - measured over the span of the whole retrieval the initial costs
Is there an Order By in the query? 10K rows is quite a lot if it has to be sorted.
Also, retrieving the PK is not a fair test versus retrieving the entire CLOB. Oracle stores the table rows with probably many in a block, but each of the CLOBs (if they are > 4K) will be stored out of line, each in a series of blocks. Scanning the list of PK's is therefore going to be fast. Also, there is probably an index on the PK, so Oracle can just quickly scan the index blocks and not even access the table.
4 seconds does seem a little high, but it is 2MB that needs to be possible read from disk and transported over the network to your Java program. Network could be an issue. If you perform an SQL trace of the session it will point you at exactly where the time is being spent (disk reads or network).