Having trouble getting DataStax OpsCenter 4.0.3 to connect to Cassandra 2.0.4 cluster

Blueplastic picture Blueplastic · Jan 30, 2014 · Viewed 8.4k times · Source

I have a 1-node C* 2.0.4 cluster running and nodetool status shows a healthy cluster.

I then installed OpsCenter 4.0.3 on a separate machine on the same network using 'sudo yum install opscenter-free'.

In the opscenterd.conf file, I set the interface = 'public IP of OpsCenter server' and started the OpsCenter server.

I was then able to see the OpsCenter webpage and clicked on Use Existing Cluster.

Under the Add Cluster interface, I typed in the rpc_address for the 1-node Cassandra cluster. OpsCenter accepted it and showed the cluster name correctly on the next page.

However, none of the graphs in OpsCenter load and I see the error: 0 of 0 agents connected. I also see a blinking red X with against a plug icon on the top.

Firewall is currently turned off in CentOS on both the OpsCenter and C* nodes.

How do I get OpsCenter to properly connect to the C* node?

Here's what the OpsCenter log shows (note: I replaced the IP with A.B.C.D):

2014-01-30 06:43:37+0000 [Dog]  WARN: Unable to collect datacenter, rack information: Failed query to http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D : Connection was refused by other side: 111: Connection refused.
2014-01-30 06:45:37+0000 [Dog]  WARN: HTTP request http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D failed: Connection was refused by other side: 111: Connection refused.
2014-01-30 06:45:37+0000 [Dog]  WARN: Unable to collect datacenter, rack information: Failed query to http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D : Connection was refused by other side: 111: Connection refused.

On the Cassandra node, everything looks healthy:

[root@cassandra01 ~]# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns   Host ID                               Rack
UN  A.B.C.D  158.27 KB  256     100.0%  bc560cd6-a20d-4b36-99ca-ed477dc939b5  rack1

However, I can't curl that URL that OpsCenter is trying to get to:

[root@cassandra01 ~]# curl http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D
curl: (7) couldn't connect to host

SSL is turned off (default settings) in the OpsCenterd.conf file.

Here is what I see at the following URL http://Public IP of OpsCenter:8888/Dog/nodes

[{"load": null, "has_jna": false, "vnodes": true, "devices": {"saved_caches": null, "commitlog": null, "other": null, "data": null}, "task_progress": {}, "node_ip": "A.B.C.D", "network_interfaces": null, "ec2": {}, "node_version": {}, "dc": null, "node_name": null, "num_procs": null, "streaming": {}, "token": "5743408169174478324", "data_held": null, "mode": "unknown", "rpc_ip": "10.183.132.141", "partitions": {"saved_caches": null, "commitlog": null, "other": null, "data": null}, "os": null, "rack": null, "last_seen": 0}]

Any ideas of how to fix this?

Note, in Cassandra's YAML file, the rpc_server_type is set to sync.


Update:

I also tried to manually install the OpsCenter agent on the C* node with 'yum install datastax-agent' and then edited the address.yaml file with the following settings:

stomp_interface: 'public ip of machine opscenterd is running on (public IP)'
local_interface: 'listen_address in cassandra.yaml (public IP)'
agent_rpc_interface: 'rpc address in cassandra.yaml (private IP network)'
agent_rpc_broadcast_address: 'private network IP, same network at rpc address'

I tried a few different settings for the address.yaml file and none of them worked. For example, I tried to only set the stop_interface and deleted the other 3 lines. Didn't work. I also tried to set just to stop and local interfaces and that also didn't work.

When I now start the datastax agent with 'service datastax-agent start', suddenly the Cassandra service crashes:

[root@cassandra01 ~]# sudo service cassandra status cassandra dead but pid file exists

When the C* service crashes, the opscenter agent stays up and running. If I stop the agent service and start the C* service again (sudo service cassandra status), then the C* starts back up successfully and nodetool status shows a healthy 1-node cluster. But the as soon as I start the agent service, the C* service suddenly crashes again. All the different settings I tried in the address.yaml file causes this same behavior.

Ideally, I'd rather not install the Agent manually and would like to just push it's installer from the OpsCenter GUI onto the C* node, but since that wasn't worked I tried to manually install the agent and connect it to OpsCenter, but this isn't working either unfortunately.

And sometimes I see this as well on the Cassandra node when the Cassandra service crashes: [root@cassandra01 ~]# sudo service cassandra stoplog4j:WARN No appenders could be found for logger (org.eclipse.jetty.util.log). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Usage: cassandra start|stop|status|restart|reload

Here is what the Cassandra node's log4j-server.properties has in it:

log4j.rootLogger=INFO,stdout,R

# stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p %d{HH:mm:ss,SSS} %m%n

# rolling log file
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.maxFileSize=20MB
log4j.appender.R.maxBackupIndex=50
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%5p [%t] %d{ISO8601} %F (line %L) %m%n
# Edit the next line to point to your logs directory
log4j.appender.R.File=/var/log/cassandra/system.log

# Application logging options
#log4j.logger.org.apache.cassandra=DEBUG
#log4j.logger.org.apache.cassandra.db=DEBUG
#log4j.logger.org.apache.cassandra.service.StorageProxy=DEBUG

# Adding this to avoid thrift logging disconnect errors.
log4j.logger.org.apache.thrift.server.TNonblockingServer=ERROR

And finally, here is what the agent.log from the OpsCenter agent running on the Cassandra node shows:

nohup: ignoring input
Starting DataStax agent monitor datastax_agent_monitor
 INFO [main] 2014-01-30 08:24:59,104 Loading conf files: /var/lib/datastax-agent /conf/address.yaml
 INFO [main] 2014-01-30 08:24:59,261 Java vendor/version: Java HotSpot(TM) 64-Bi t Server VM/1.7.0_25
 INFO [main] 2014-01-30 08:24:59,546 Default config values: {:rollups300_ttl 241 9200, :settings_cf "settings", :agent_rpc_interface "10.183.132.141", :my_channe l_prefix "/agent", :poll_period 60, :kerberos_hostname nil, :storage_dc nil, :th rift_conn_timeout 10000, :thrift_max_frame_size 15728640, :rollups60_ttl 604800,  :stomp_port 61620, :shorttime_interval 10, :longtime_interval 300, :private-con f-props ["initial_token" "listen_address" "broadcast_address" "rpc_address"], :t hrift_port 9160, :async_retry_timeout 5, :agent-conf-group "global-cluster-agent -group", :jmx_host "127.0.0.1", :ec2_metadata_api_host "169.254.169.254", :metri cs_enabled 1, :async_queue_size 5000, :autodiscovery_interval 120, :rollups7200_ ttl 31536000, :autodiscovery_enabled true, :thrift_ssl_truststore nil, :rollup_s napshot_period 300, :is_package true, :monitor_command "/usr/share/datastax-agen t/bin/datastax_agent_monitor", :thrift_socket_timeout 5000, :cassandra_log_locat ion "/var/log/cassandra/system.log", :local_interface "23.253.64.169", :jmx_port  7199, :jmx_metrics_threadpool_size 4, :use_ssl 0, :rollups86400_ttl -1, :nodede tails_threadpool_size 3, :api_port 61621, :kerberos_service nil, :kerberos_clien t_principal nil, :jmx_thread_pool_size 5, :production 1, :stomp_interface "166.7 8.186.184", :storage_keyspace "OpsCenter", :rollup_snapshot_threshold 300, :thri ft_ssl_truststore_type "JKS", :realtime_interval 5}
 INFO [main] 2014-01-30 08:24:59,554 Waiting for the config from OpsCenter
 INFO [main] 2014-01-30 08:24:59,559 Using 23.253.64.169 as the cassandra broadc ast address
 INFO [main] 2014-01-30 08:24:59,568 New JMX connection (127.0.0.1:7199)
ERROR [main] 2014-01-30 08:25:00,019 Error connecting via JMX: java.io.IOExcepti on: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException  [Root exception is java.rmi.ConnectException: Connection refused to host: 127.0. 0.1; nested exception is:
        java.net.ConnectException: Connection refused]
 INFO [main] 2014-01-30 08:25:00,414 cassandra RPC address is  nil
 INFO [main] 2014-01-30 08:25:00,418 agent RPC broadcast address is  10.183.132. 141
 INFO [main] 2014-01-30 08:25:00,474 Clearing ssl.truststore
 INFO [main] 2014-01-30 08:25:00,475 Clearing ssl.truststore.password
 INFO [main] 2014-01-30 08:25:00,476 Setting ssl.store.type to JKS
 INFO [main] 2014-01-30 08:25:00,477 Clearing kerberos.service.principal.name
 INFO [main] 2014-01-30 08:25:00,480 Clearing kerberos.principal
 INFO [main] 2014-01-30 08:25:00,480 Clearing kerberos.useTicketCache
 INFO [main] 2014-01-30 08:25:00,481 Clearing kerberos.ticketCache
 INFO [main] 2014-01-30 08:25:00,487 Clearing kerberos.useKeyTab
 INFO [main] 2014-01-30 08:25:00,487 Clearing kerberos.keyTab
 INFO [main] 2014-01-30 08:25:00,487 Clearing kerberos.renewTGT
 INFO [main] 2014-01-30 08:25:00,488 Clearing kerberos.debug
 INFO [main] 2014-01-30 08:25:00,495 Starting Stomp
 INFO [main] 2014-01-30 08:25:00,495 SSL communication is disabled
 INFO [main] 2014-01-30 08:25:00,495 Creating stomp connection to 166.78.186.184 :61620
 INFO [thrift-init] 2014-01-30 08:25:00,521 Connecting to Cassandra cluster: 23. 253.64.169 (port 9160)
 INFO [StompConnection receiver] 2014-01-30 08:25:00,536 Reconnecting in 0s.
 INFO [StompConnection receiver] 2014-01-30 08:25:00,561 Connected to 166.78.186 .184:61620
 INFO [thrift-init] 2014-01-30 08:25:00,619 Downed Host Retry service started wi th queue size -1 and retry delay 10s
 INFO [thrift-init] 2014-01-30 08:25:00,662 Registering JMX me.prettyprint.cassa ndra.service_Agent Cluster:ServiceType=hector,MonitorType=hector
 INFO [main] 2014-01-30 08:25:00,732 Starting Jetty server: {:port 61621, :host  "10.183.132.141", :ssl? false, :join? false}
ERROR [thrift-init] 2014-01-30 08:25:00,885 MARK HOST AS DOWN TRIGGERED for host  23.253.64.169(23.253.64.169):9160
ERROR [thrift-init] 2014-01-30 08:25:00,886 Pool state on shutdown: <ConcurrentC assandraClientPoolByHost>:{23.253.64.169(23.253.64.169):9160}; IsActive?: true;  Active: 0; Blocked: 1; Idle: 0; NumBeforeExhausted: 1
 INFO [thrift-init] 2014-01-30 08:25:00,887 Shutdown triggered on <ConcurrentCas sandraClientPoolByHost>:{23.253.64.169(23.253.64.169):9160}
 INFO [thrift-init] 2014-01-30 08:25:00,901 Shutdown complete on <ConcurrentCass andraClientPoolByHost>:{23.253.64.169(23.253.64.169):9160}
 INFO [thrift-init] 2014-01-30 08:25:00,902 Host detected as down was added to r etry queue: 23.253.64.169(23.253.64.169):9160
 WARN [thrift-init] 2014-01-30 08:25:00,914 Could not fullfill request on this h ost null
 WARN [Hector.me.prettyprint.cassandra.connection.CassandraHostRetryService-1] 2 014-01-30 08:25:00,910 Downed 23.253.64.169(23.253.64.169):9160 host still appea rs to be down: Unable to open transport to 23.253.64.169(23.253.64.169):9160 , j ava.net.ConnectException: Connection refused
 WARN [thrift-init] 2014-01-30 08:25:00,926 Exception:
me.prettyprint.hector.api.exceptions.HectorTransportException: Unable to open tr ansport to 23.253.64.169(23.253.64.169):9160 , java.net.ConnectException: Connec tion refused
        at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThrift Client.java:180)
        at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThrift Client.java:38)
        at me.prettyprint.cassandra.connection.ConcurrentHClientPool.createClien t(ConcurrentHClientPool.java:162)
        at me.prettyprint.cassandra.connection.ConcurrentHClientPool.borrowClien t(ConcurrentHClientPool.java:94)
        at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFai lover(HConnectionManager.java:250)
        at me.prettyprint.cassandra.service.AbstractCluster.describeClusterName( AbstractCluster.java:155)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
        at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:298)
        at clj_hector.core$cluster_name.invoke(core.clj:40)
        at opsagent.cassandra$setup_cassandra$f__353__auto____900$fn__920.invoke (cassandra.clj:360)
        at opsagent.cassandra$setup_cassandra$f__353__auto____900.invoke(cassand ra.clj:358)
        at clojure.lang.AFn.run(AFn.java:24)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectExce ption: Connection refused
        at org.apache.thrift.transport.TSocket.open(TSocket.java:183)
        at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.ja va:81)
        at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThrift Client.java:174)
        ... 16 more
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.net.SocksSocketImpl.connect(Unknown Source)
        at java.net.Socket.connect(Unknown Source)
        at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
        ... 18 more
ERROR [thrift-init] 2014-01-30 08:25:00,965 Error when performing thrift operati on:
me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down . Retry burden pushed out to client.
        at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromL BPolicy(HConnectionManager.java:395)
        at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFai lover(HConnectionManager.java:249)
        at me.prettyprint.cassandra.service.AbstractCluster.describeClusterName( AbstractCluster.java:155)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
        at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:298)
        at clj_hector.core$cluster_name.invoke(core.clj:40)
        at opsagent.cassandra$setup_cassandra$f__353__auto____900$fn__920.invoke (cassandra.clj:360)
        at opsagent.cassandra$setup_cassandra$f__353__auto____900.invoke(cassand ra.clj:358)
        at clojure.lang.AFn.run(AFn.java:24)
        at java.lang.Thread.run(Unknown Source)
 INFO [StompConnection receiver] 2014-01-30 08:25:01,024 Got new config from Ops Center: {:kerberos_use_keytab true, :rollups300_ttl 2419200, :kerberos_use_ticke t_cache true, :rollups60_ttl 604800, :thrift_port 9160, :ec2_metadata_api_host " 169.254.169.254", :metrics_enabled 1, :rollups7200_ttl 31536000, :thrift_ssl_tru ststore nil, :metrics_ignored_column_families "", :cassandra_log_location "/var/ log/cassandra/system.log", :thrift_rpc_interface "10.183.132.141", :thrift_ssl_t ruststore_password nil, :jmx_port 7199, :provisioning 0, :use_ssl 0, :kerberos_d ebug false, :rollups86400_ttl -1, :api_port "61621", :storage_keyspace "OpsCente r", :kerberos_renew_tgt true, :metrics_ignored_solr_cores "", :thrift_ssl_trusts tore_type "JKS", :metrics_ignored_keyspaces "system, system_traces, system_auth,  dse_auth, OpsCenter", :rollup_subscriptions [], :cassandra_install_location ""}
 INFO [StompConnection receiver] 2014-01-30 08:25:01,030 Starting up agent colle ction.
 INFO [StompConnection receiver] 2014-01-30 08:25:01,040 New JMX connection (127 .0.0.1:7199)
ERROR [StompConnection receiver] 2014-01-30 08:25:01,073 Error connecting via JM X: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceU navailableException [Root exception is java.rmi.ConnectException: Connection ref used to host: 127.0.0.1; nested exception is:
        java.net.ConnectException: Connection refused]
 INFO [Jetty] 2014-01-30 08:25:01,160 Jetty server started
 INFO [StompConnection receiver] 2014-01-30 08:25:01,188 Starting OS metric coll ectors (Linux)
 INFO [StompConnection receiver] 2014-01-30 08:25:01,199 Starting Cassandra JMX  metric collectors
 INFO [install-location-finder] 2014-01-30 08:25:01,250 New JMX connection (127. 0.0.1:7199)
 INFO [StompConnection receiver] 2014-01-30 08:25:01,252 New JMX connection (127 .0.0.1:7199)
ERROR [install-location-finder] 2014-01-30 08:25:01,261 Error connecting via JMX : java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUn availableException [Root exception is java.rmi.ConnectException: Connection refu sed to host: 127.0.0.1; nested exception is:
        java.net.ConnectException: Connection refused]

Answer

Arya picture Arya · Jan 30, 2014

This forum post seem to be capturing some of the issues that can happen with this setup:

http://www.datastax.com/support-forums/topic/opscenter-agent-not-connecting-to-opscenter