I have got 1000 dedicated Java threads where each thread polls a corresponding url every one second.
public class Poller {
public static Node poll(Node node) {
GetMethod method = null;
try {
HttpClient client = new HttpClient(new SimpleHttpConnectionManager(true));
......
} catch (IOException ex) {
ex.printStackTrace();
} finally {
method.releaseConnection();
}
}
}
The threads are run every one second:
for (int i=0; i <1000; i++) {
MyThread thread = threads.get(i) // threads is a static field
if(thread.isAlive()) {
// If the previous thread is still running, let it run.
} else {
thread.start();
}
}
The problem is if I run the job every one second I get random exceptions like these:
java.net.BindException: Address already in use
INFO httpclient.HttpMethodDirector: I/O exception (java.net.BindException) caught when processing request: Address already in use
INFO httpclient.HttpMethodDirector: Retrying request
But if I run the job every 2 seconds or more, everything runs fine.
I even tried shutting down the instance of SimpleHttpConnectionManager() using shutDown() with no effect.
If I do netstat, I see thousands of TCP connections in TIME_WAIT state, which means they are have been closed and are clearing up.
So to limit the no of connections, I tried using a single instance of HttpClient and use it like this:
public class MyHttpClientFactory {
private static MyHttpClientFactory instance = new HttpClientFactory();
private MultiThreadedHttpConnectionManager connectionManager;
private HttpClient client;
private HttpClientFactory() {
init();
}
public static HttpClientFactory getInstance() {
return instance;
}
public void init() {
connectionManager = new MultiThreadedHttpConnectionManager();
HttpConnectionManagerParams managerParams = new HttpConnectionManagerParams();
managerParams.setMaxTotalConnections(1000);
connectionManager.setParams(managerParams);
client = new HttpClient(connectionManager);
}
public HttpClient getHttpClient() {
if (client != null) {
return client;
} else {
init();
return client;
}
}
}
However after running for exactly 2 hours, it starts throwing 'too many open files' and eventually cannot do anything at all.
ERROR java.net.SocketException: Too many open files
INFO httpclient.HttpMethodDirector: I/O exception (java.net.SocketException) caught when processing request: Too many open files
INFO httpclient.HttpMethodDirector: Retrying request
I should be able to increase the no of connections allowed and make it work, but I would just be prolonging the evil. Any idea what is the best practise to use HttpClient in a situation like above?
Btw, I am still on HttpClient3.1.
This happened to us a few months back. First, double check to make sure you really are calling releaseConnection() every time. But even then, the OS doesn't actually reclaim the TCP connections all at once. The solution is to use the Apache HTTP Client's MultiThreadedHttpConnectionManager. This pools and reuses the connections.
See http://hc.apache.org/httpclient-3.x/performance.html for more performance tips.
Update: Whoops, I didn't read the lower code sample. If you're doing releaseConnection() and using MultiThreadedHttpConnectionManager, consider whether your OS limit on open files per process is set high enough. We had that problem too, and needed to extend the limit a bit.