MySQL Galera node not starting (aborting with Error 'WSREP: [...]: 60: failed to reach primary view: 60 (Operation timed out)')

user2642601 picture user2642601 · Oct 22, 2015 · Viewed 10.1k times · Source

I am trying to setup three Galera nodes on FreeBSD 10 with MySQL 5.6.26 and VirtualBox. When I set up everything and run MySQL, it exits after some time and cannot start properly.

Here is my log:

2015-10-22 15:23:24 9402 [Note] WSREP: Read nil XID from storage engines, skipping position init
2015-10-22 15:23:24 9402 [Note] WSREP: wsrep_load(): loading provider library '/usr/local/lib/libgalera_smm.so'
2015-10-22 15:23:24 9402 [Note] WSREP: wsrep_load(): Galera 3.5(rXXXX) by Codership Oy <[email protected]> loaded successfully.
2015-10-22 15:23:24 9402 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2015-10-22 15:23:24 9402 [Note] WSREP: Found saved state: 9bfd9448-780a-11e5-a465-e268e80baf6e:-1
2015-10-22 15:23:24 9402 [Note] WSREP: Passing config to GCS: base_host = 192.168.1.10; base_port = 4567; cert.log_conflicts = no; debug = no; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /home/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /home/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = 192.168.1.10; gmcast.segment = 0; gmcast.version = 0; ist.recv_addr = 192.168.1.10; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.version = 0; pc.wait_prim 
2015-10-22 15:23:24 9402 [Note] WSREP: Service thread queue flushed.
2015-10-22 15:23:24 9402 [Note] WSREP: Assign initial position for certification: 4, protocol version: -1
2015-10-22 15:23:24 9402 [Note] WSREP: wsrep_sst_grab()
2015-10-22 15:23:24 9402 [Note] WSREP: Start replication
2015-10-22 15:23:24 9402 [Note] WSREP: Setting initial position to 9bfd9448-780a-11e5-a465-e268e80baf6e:4
2015-10-22 15:23:24 9402 [Note] WSREP: protonet asio version 0
2015-10-22 15:23:24 9402 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
2015-10-22 15:23:24 9402 [Note] WSREP: backend: asio
2015-10-22 15:23:24 9402 [Note] WSREP: GMCast version 0
2015-10-22 15:23:24 9402 [Note] WSREP: (b08a4d6e-78b7-11e5-80bf-12866e73025e, 'tcp://192.168.1.10:4567') listening at tcp://192.168.1.10:4567
2015-10-22 15:23:24 9402 [Note] WSREP: (b08a4d6e-78b7-11e5-80bf-12866e73025e, 'tcp://192.168.1.10:4567') multicast: , ttl: 1
2015-10-22 15:23:24 9402 [Note] WSREP: EVS version 0
2015-10-22 15:23:24 9402 [Note] WSREP: PC version 0
2015-10-22 15:23:24 9402 [Note] WSREP: gcomm: connecting to group 'test', peer '192.168.1.10:,192.168.1.20:,192.168.1.30:'
2015-10-22 15:23:27 9402 [Warning] WSREP: no nodes coming from prim view, prim not possible
2015-10-22 15:23:27 9402 [Note] WSREP: view(view_id(NON_PRIM,b08a4d6e-78b7-11e5-80bf-12866e73025e,1) memb {
    b08a4d6e-78b7-11e5-80bf-12866e73025e,0
} joined {
} left {
} partitioned {
})
2015-10-22 15:23:27 9402 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.6479S), skipping check
2015-10-22 15:23:57 9402 [Note] WSREP: view((empty))
2015-10-22 15:23:57 9402 [ERROR] WSREP: failed to open gcomm backend connection: 60: failed to reach primary view: 60 (Operation timed out)
     at gcomm/src/pc.cpp:connect():141
2015-10-22 15:23:57 9402 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():202: Failed to open backend connection: -60 (Operation timed out)
2015-10-22 15:23:57 9402 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'test' at 'gcomm://192.168.1.10,192.168.1.20,192.168.1.30': -60 (Operation timed out)
2015-10-22 15:23:57 9402 [ERROR] WSREP: gcs connect failed: Operation timed out
2015-10-22 15:23:57 9402 [ERROR] WSREP: wsrep::connect(gcomm://192.168.1.10,192.168.1.20,192.168.1.30) failed: 7
2015-10-22 15:23:57 9402 [ERROR] Aborting

2015-10-22 15:23:57 9402 [Note] WSREP: Service disconnected.
2015-10-22 15:23:58 9402 [Note] WSREP: Some threads may fail to exit.
2015-10-22 15:23:58 9402 [Note] Binlog end
2015-10-22 15:23:58 9402 [Note] /usr/local/libexec/mysqld: Shutdown complete

151022 15:23:58 mysqld_safe mysqld from pid file /home/mysql/galera1.pid ended

Part of my.cnf regarding wsrep config:

wsrep_provider=/usr/local/lib/libgalera_smm.so
wsrep_cluster_name="test"
wsrep_cluster_address="gcomm://192.168.1.10,192.168.1.20,192.168.1.30"
wsrep_slave_threads=8
wsrep_node_address = "192.168.1.10"
wsrep_sst_receive_address = "192.168.1.10"
wsrep_node_incoming_address = "192.168.1.10"
wsrep_provider_options = "gmcast.listen_addr=192.168.1.10;gcache.size=128M;ist.recv_addr=192.168.1.10"
wsrep_auto_increment_control=1
wsrep_retry_autocommit=0
wsrep_max_ws_size=3741824
wsrep_max_ws_rows=56000
wsrep_certify_nonPK=1
wsrep_convert_LOCK_to_trx=0
wsrep_sst_donor=galera1
wsrep_sst_donor_rejects_queries=1
  • Node 1 - 192.168.1.10
  • Node 2 - 192.168.1.20
  • Node 3 - 192.168.1.30

The above output is from node 1.

The networking between the nodes is working properly, so I can't seem to find a reason for this not to work.

Answer

Ciprian Stoica picture Ciprian Stoica · Oct 23, 2015

Make sure you start the first node by running the following command:

service mysql start --wsrep-new-cluster

Start the next nodes by running the command:

service mysql start

I get exactly the same errors as your when I forget to add the param --wsrep-new-cluster when I start the first node.

Check this page for details: Starting the cluster

Just a quick edit: I personally use Galera with MariaDB and the commands above work properly. As you use MySQL, you might need to switch mysql with mysqld in the commands above. Try with both.