I am now using a CDH-5.3.1 cluster with three zookeeper instances located in three ips:
133.0.127.40 n1
133.0.127.42 n2
133.0.127.44 n3
Everything works fine when it starts, but these days I notice that the node n2 keeps getting the WARN:
caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid **0x0**, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:722)
it happens every second, and only on n2, while n1 and n3 are fine. I can still use HBase shell to scan my table, and the Solr WEB UI to do querys. But I cannot start Flume agents, the process all stops at this step:
Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
jetty-6.1.26.cloudera.4
Started [email protected]:41414.
And minutes later I get the warning from Cloudera Manager that Flume agent is exceeding the threshold of File Descriptors.
Does anyone know what is going wrong? Thanks in advance.
I recall seeing similar errors in ZK (admittedly not with Flume). I believe the problem at the time was to do with the large amount of data stored on the node and/or transferred to the client. Things to consider tweaking in zoo.cfg:
autopurge.snapRetainCount
, e.g. set it to 10autopurge.purgeInterval
to, say, 2 (hours)If the ZK client (Flume?) is streaming large znodes to/from the ZK cluster, you may want to set the Java system property jute.maxbuffer
on the client JVM(s), and possibly on the server nodes, to a large enough value. I believe the default value for this property is 1M. Determining the appropriate value for your workload is an exercise in trial and error I'm afraid!