MariaDB Galera nodes down after reboot

Guillaume picture Guillaume · Oct 22, 2013 · Viewed 7.3k times · Source

I'm using MariaDB with Galera replication on 3 Ubuntu 12.04 servers :

    Server version: 5.5.33a-MariaDB-1~precise mariadb.org binary distribution, wsrep_23.7.6.rXXXX

They're all development servers, in the same local network, so there's no firewall between them. They were all working fine until a reboot. Now node 2 and 3 aren't in sync anymore and mysql refuse to start.

Replication is done using xtrabackup.

I can start over by doing a mysqlhotcopy on node1 and put the data on node2 and node3 but i'd like to understand why it didn't survived a reboot.

Here's my nodes :

  • Node 1 : dev-sql-001 / 192.168.3.121
  • Node 2 : dev-sql-002 / 192.168.3.221
  • Node 3 : dev-sql-003 / 192.168.3.222

  • Here's the log on node 1 when I try to start node 2 :

    Oct 22 10:19:29 dev-sql-001 mysqld: 131022 10:19:29 [Note] WSREP: declaring abbffcb5-3af2-11e3-a043-5b38dfb689cf stable
    Oct 22 10:19:29 dev-sql-001 mysqld: 131022 10:19:29 [Note] WSREP: Node f743192f-3aee-11e3-a47b-97626a725dde state prim
    Oct 22 10:19:29 dev-sql-001 mysqld: 131022 10:19:29 [Note] WSREP: view(view_id(PRIM,abbffcb5-3af2-11e3-a043-5b38dfb689cf,6) memb {
    Oct 22 10:19:29 dev-sql-001 mysqld: #011abbffcb5-3af2-11e3-a043-5b38dfb689cf,
    Oct 22 10:19:29 dev-sql-001 mysqld: #011f743192f-3aee-11e3-a47b-97626a725dde,
    Oct 22 10:19:29 dev-sql-001 mysqld: } joined {
    Oct 22 10:19:29 dev-sql-001 mysqld: } left {
    Oct 22 10:19:29 dev-sql-001 mysqld: } partitioned {
    Oct 22 10:19:29 dev-sql-001 mysqld: })
    Oct 22 10:19:29 dev-sql-001 mysqld: 131022 10:19:29 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
    Oct 22 10:19:29 dev-sql-001 mysqld: 131022 10:19:29 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
    Oct 22 10:19:29 dev-sql-001 mysqld: 131022 10:19:29 [Note] WSREP: STATE EXCHANGE: sent state msg: ac5a1fa8-3af2-11e3-ba64-7e36060f57f6
    Oct 22 10:19:29 dev-sql-001 mysqld: 131022 10:19:29 [Note] WSREP: STATE EXCHANGE: got state msg: ac5a1fa8-3af2-11e3-ba64-7e36060f57f6 from 0 (dev-sql-002)
    Oct 22 10:19:29 dev-sql-001 mysqld: 131022 10:19:29 [Note] WSREP: STATE EXCHANGE: got state msg: ac5a1fa8-3af2-11e3-ba64-7e36060f57f6 from 1 (dev-sql-001)
    Oct 22 10:19:29 dev-sql-001 mysqld: 131022 10:19:29 [Note] WSREP: Quorum results:
    Oct 22 10:19:29 dev-sql-001 mysqld: #011version    = 2,
    Oct 22 10:19:29 dev-sql-001 mysqld: #011component  = PRIMARY,
    Oct 22 10:19:29 dev-sql-001 mysqld: #011conf_id    = 5,
    Oct 22 10:19:29 dev-sql-001 mysqld: #011members    = 1/2 (joined/total),
    Oct 22 10:19:29 dev-sql-001 mysqld: #011act_id     = 968,
    Oct 22 10:19:29 dev-sql-001 mysqld: #011last_appl. = 0,
    Oct 22 10:19:29 dev-sql-001 mysqld: #011protocols  = 0/4/2 (gcs/repl/appl),
    Oct 22 10:19:29 dev-sql-001 mysqld: #011group UUID = 2896facc-2110-11e3-0800-717f6d7ec2af
    Oct 22 10:19:29 dev-sql-001 mysqld: 131022 10:19:29 [Note] WSREP: Flow-control interval: [23, 23]
    Oct 22 10:19:29 dev-sql-001 mysqld: 131022 10:19:29 [Note] WSREP: New cluster view: global state: 2896facc-2110-11e3-0800-717f6d7ec2af:968, view# 6: Primary, number of nodes: 2, my index: 1, protocol version 2
    Oct 22 10:19:29 dev-sql-001 mysqld: 131022 10:19:29 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    Oct 22 10:19:29 dev-sql-001 mysqld: 131022 10:19:29 [Note] WSREP: Assign initial position for certification: 968, protocol version: 2
    Oct 22 10:19:32 dev-sql-001 mysqld: 131022 10:19:32 [Note] WSREP: Node 0 (dev-sql-002) requested state transfer from '*any*'. Selected 1 (dev-sql-001)(SYNCED) as donor.
    Oct 22 10:19:32 dev-sql-001 mysqld: 131022 10:19:32 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 968)
    Oct 22 10:19:32 dev-sql-001 mysqld: 131022 10:19:32 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    Oct 22 10:19:32 dev-sql-001 mysqld: 131022 10:19:32 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address '192.168.3.221:4444/xtrabackup_sst' --auth 'root:test' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --gtid '2896facc-2110-11e3-0800-717f6d7ec2af:968''
    Oct 22 10:19:32 dev-sql-001 mysqld: 131022 10:19:32 [Note] WSREP: sst_donor_thread signaled with 0
    Oct 22 10:19:32 dev-sql-001 mysqld: WSREP_SST: [INFO] Streaming with tar (20131022 10:19:32.294)
    Oct 22 10:19:32 dev-sql-001 mysqld: WSREP_SST: [INFO] Note: Advanced xtrabackup features - encryption,compression etc. not available with tar. (20131022 10:19:32.301)
    Oct 22 10:19:32 dev-sql-001 mysqld: WSREP_SST: [INFO] Streaming the backup to joiner at 192.168.3.221 4444 (20131022 10:19:32.330)
    Oct 22 10:19:32 dev-sql-001 mysqld: WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /var/lib/mysql//innobackup.backup.log (20131022 10:19:32.662)
    Oct 22 10:19:32 dev-sql-001 mysqld: WSREP_SST: [ERROR] Cleanup after exit with status:22 (20131022 10:19:32.668)
    Oct 22 10:19:32 dev-sql-001 mysqld: 131022 10:19:32 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup --role 'donor' --address '192.168.3.221:4444/xtrabackup_sst' --auth 'root:test' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --gtid '2896facc-2110-11e3-0800-717f6d7ec2af:968'
    Oct 22 10:19:32 dev-sql-001 mysqld: 131022 10:19:32 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'donor' --address '192.168.3.221:4444/xtrabackup_sst' --auth 'root:test' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --gtid '2896facc-2110-11e3-0800-717f6d7ec2af:968': 22 (Invalid argument)
    Oct 22 10:19:32 dev-sql-001 mysqld: 131022 10:19:32 [Warning] WSREP: 1 (dev-sql-001): State transfer to 0 (dev-sql-002) failed: -1 (Operation not permitted)
    Oct 22 10:19:32 dev-sql-001 mysqld: 131022 10:19:32 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 968)
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: Member 1 (dev-sql-001) synced with group.
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 968)
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: Synchronized with group, ready for connections
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: Node f743192f-3aee-11e3-a47b-97626a725dde state prim
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: view(view_id(PRIM,f743192f-3aee-11e3-a47b-97626a725dde,7) memb {
    Oct 22 10:19:33 dev-sql-001 mysqld: #011f743192f-3aee-11e3-a47b-97626a725dde,
    Oct 22 10:19:33 dev-sql-001 mysqld: } joined {
    Oct 22 10:19:33 dev-sql-001 mysqld: } left {
    Oct 22 10:19:33 dev-sql-001 mysqld: } partitioned {
    Oct 22 10:19:33 dev-sql-001 mysqld: #011abbffcb5-3af2-11e3-a043-5b38dfb689cf,
    Oct 22 10:19:33 dev-sql-001 mysqld: })
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: forgetting abbffcb5-3af2-11e3-a043-5b38dfb689cf (tcp://192.168.3.221:4567)
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: STATE_EXCHANGE: sent state UUID: ae9636c9-3af2-11e3-b69e-0fef55014e34
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: STATE EXCHANGE: sent state msg: ae9636c9-3af2-11e3-b69e-0fef55014e34
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: STATE EXCHANGE: got state msg: ae9636c9-3af2-11e3-b69e-0fef55014e34 from 0 (dev-sql-001)
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: Quorum results:
    Oct 22 10:19:33 dev-sql-001 mysqld: #011version    = 2,
    Oct 22 10:19:33 dev-sql-001 mysqld: #011component  = PRIMARY,
    Oct 22 10:19:33 dev-sql-001 mysqld: #011conf_id    = 6,
    Oct 22 10:19:33 dev-sql-001 mysqld: #011members    = 1/1 (joined/total),
    Oct 22 10:19:33 dev-sql-001 mysqld: #011act_id     = 968,
    Oct 22 10:19:33 dev-sql-001 mysqld: #011last_appl. = 0,
    Oct 22 10:19:33 dev-sql-001 mysqld: #011protocols  = 0/4/2 (gcs/repl/appl),
    Oct 22 10:19:33 dev-sql-001 mysqld: #011group UUID = 2896facc-2110-11e3-0800-717f6d7ec2af
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: Flow-control interval: [16, 16]
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: New cluster view: global state: 2896facc-2110-11e3-0800-717f6d7ec2af:968, view# 7: Primary, number of nodes: 1, my index: 0, protocol version 2
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    Oct 22 10:19:33 dev-sql-001 mysqld: 131022 10:19:33 [Note] WSREP: Assign initial position for certification: 968, protocol version: 2
    Oct 22 10:19:38 dev-sql-001 mysqld: 131022 10:19:38 [Note] WSREP: cleaning up abbffcb5-3af2-11e3-a043-5b38dfb689cf (tcp://192.168.3.221:4567)
    

/var/lib/mysql//innobackup.backup.log on node1 :

    131022 10:19:32  innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/mysql/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/run/mysqld/mysqld.sock' as 'root'  (using password: YES).
    innobackupex: Error: Failed to connect to MySQL server: DBI connect(';mysql_read_default_file=/etc/mysql/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/run/mysqld/mysqld.sock','root',...) failed: Access denied for user 'root'@'localhost' (using password: YES) at /usr//bin/innobackupex line 2902
  • As for node2 logs :

    Oct 22 10:19:25 dev-sql-002 mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
    Oct 22 10:19:25 dev-sql-002 mysqld_safe: WSREP: Running position recovery with --log_error=/tmp/tmp.gp3R8iSBdK --pid-file=/var/lib/mysql/dev-sql-002-recover.pid
    Oct 22 10:19:25 dev-sql-002 mysqld: 131022 10:19:25 [Warning] option 'table_cache': unsigned value 2097152 adjusted to 524288
    Oct 22 10:19:28 dev-sql-002 mysqld_safe: WSREP: Recovered position 2896facc-2110-11e3-0800-717f6d7ec2af:370
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Warning] option 'table_cache': unsigned value 2097152 adjusted to 524288
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: wsrep_start_position var submitted: '2896facc-2110-11e3-0800-717f6d7ec2af:370'
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: Read nil XID from storage engines, skipping position init
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: wsrep_load(): Galera 23.2.7-wheezy(r) by Codership Oy <[email protected]> loaded succesfully.
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: Passing config to GCS: base_host = 192.168.3.221; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: wsrep_sst_grab()
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: Start replication
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: protonet asio version 0
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: backend: asio
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: GMCast version 0
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: (abbffcb5-3af2-11e3-a043-5b38dfb689cf, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: (abbffcb5-3af2-11e3-a043-5b38dfb689cf, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: EVS version 0
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: PC version 0
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: gcomm: connecting to group 'dev-sql', peer 'dev-sql-001:,dev-sql-002:'
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Warning] WSREP: (abbffcb5-3af2-11e3-a043-5b38dfb689cf, 'tcp://0.0.0.0:4567') address 'tcp://192.168.3.221:4567' points to own listening address, blacklisting
    Oct 22 10:19:28 dev-sql-002 mysqld: 131022 10:19:28 [Note] WSREP: (abbffcb5-3af2-11e3-a043-5b38dfb689cf, 'tcp://0.0.0.0:4567') address 'tcp://192.168.3.221:4567' pointing to uuid abbffcb5-3af2-11e3-a043-5b38dfb689cf is blacklisted, skipping
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: declaring f743192f-3aee-11e3-a47b-97626a725dde stable
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: Node f743192f-3aee-11e3-a47b-97626a725dde state prim
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: view(view_id(PRIM,abbffcb5-3af2-11e3-a043-5b38dfb689cf,6) memb {
    Oct 22 10:19:29 dev-sql-002 mysqld: #011abbffcb5-3af2-11e3-a043-5b38dfb689cf,
    Oct 22 10:19:29 dev-sql-002 mysqld: #011f743192f-3aee-11e3-a47b-97626a725dde,
    Oct 22 10:19:29 dev-sql-002 mysqld: } joined {
    Oct 22 10:19:29 dev-sql-002 mysqld: } left {
    Oct 22 10:19:29 dev-sql-002 mysqld: } partitioned {
    Oct 22 10:19:29 dev-sql-002 mysqld: })
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: gcomm: connected
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: Opened channel 'dev-sql'
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: Waiting for SST to complete.
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: STATE_EXCHANGE: sent state UUID: ac5a1fa8-3af2-11e3-ba64-7e36060f57f6
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: STATE EXCHANGE: sent state msg: ac5a1fa8-3af2-11e3-ba64-7e36060f57f6
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: STATE EXCHANGE: got state msg: ac5a1fa8-3af2-11e3-ba64-7e36060f57f6 from 0 (dev-sql-002)
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: STATE EXCHANGE: got state msg: ac5a1fa8-3af2-11e3-ba64-7e36060f57f6 from 1 (dev-sql-001)
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: Quorum results:
    Oct 22 10:19:29 dev-sql-002 mysqld: #011version    = 2,
    Oct 22 10:19:29 dev-sql-002 mysqld: #011component  = PRIMARY,
    Oct 22 10:19:29 dev-sql-002 mysqld: #011conf_id    = 5,
    Oct 22 10:19:29 dev-sql-002 mysqld: #011members    = 1/2 (joined/total),
    Oct 22 10:19:29 dev-sql-002 mysqld: #011act_id     = 968,
    Oct 22 10:19:29 dev-sql-002 mysqld: #011last_appl. = -1,
    Oct 22 10:19:29 dev-sql-002 mysqld: #011protocols  = 0/4/2 (gcs/repl/appl),
    Oct 22 10:19:29 dev-sql-002 mysqld: #011group UUID = 2896facc-2110-11e3-0800-717f6d7ec2af
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: Flow-control interval: [23, 23]
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 968)
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: State transfer required:
    Oct 22 10:19:29 dev-sql-002 mysqld: #011Group state: 2896facc-2110-11e3-0800-717f6d7ec2af:968
    Oct 22 10:19:29 dev-sql-002 mysqld: #011Local state: 00000000-0000-0000-0000-000000000000:-1
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Note] WSREP: New cluster view: global state: 2896facc-2110-11e3-0800-717f6d7ec2af:968, view# 6: Primary, number of nodes: 2, my index: 0, protocol version 2
    Oct 22 10:19:29 dev-sql-002 mysqld: 131022 10:19:29 [Warning] WSREP: Gap in state sequence. Need state transfer.
    Oct 22 10:19:31 dev-sql-002 mysqld: 131022 10:19:31 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'joiner' --address '192.168.3.221' --auth 'root:test' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '13950''
    Oct 22 10:19:31 dev-sql-002 mysqld: WSREP_SST: [INFO] Streaming with tar (20131022 10:19:31.981)
    Oct 22 10:19:31 dev-sql-002 mysqld: WSREP_SST: [INFO] Note: Advanced xtrabackup features - encryption,compression etc. not available with tar. (20131022 10:19:31.988)
    Oct 22 10:19:31 dev-sql-002 mysqld: WSREP_SST: [INFO] However, xbstream requires manual cleanup of data directory before SST - lp:1193240. (20131022 10:19:31.994)
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [Note] WSREP: Prepared SST request: xtrabackup|192.168.3.221:4444/xtrabackup_sst
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [Note] WSREP: Assign initial position for certification: 968, protocol version: 2
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (2896facc-2110-11e3-0800-717f6d7ec2af): 1 (Operation not permitted)
    Oct 22 10:19:32 dev-sql-002 mysqld: #011 at galera/src/replicator_str.cpp:prepare_for_IST():436. IST will be unavailable.
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [Note] WSREP: Node 0 (dev-sql-002) requested state transfer from '*any*'. Selected 1 (dev-sql-001)(SYNCED) as donor.
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 968)
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [Note] WSREP: Requesting state transfer: success, donor: 1
    Oct 22 10:19:32 dev-sql-002 mysqld: tar: This does not look like a tar archive
    Oct 22 10:19:32 dev-sql-002 mysqld: tar: Exiting with failure status due to previous errors
    Oct 22 10:19:32 dev-sql-002 mysqld: WSREP_SST: [ERROR] Error while getting data from donor node:  exit codes: 0 2 (20131022 10:19:32.659)
    Oct 22 10:19:32 dev-sql-002 mysqld: WSREP_SST: [ERROR] Data directory /var/lib/mysql/ needs to be empty for SST: Manual intervention required in that case (20131022 10:19:32.666)
    Oct 22 10:19:32 dev-sql-002 mysqld: WSREP_SST: [ERROR] Cleanup after exit with status:32 (20131022 10:19:32.672)
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [Warning] WSREP: 1 (dev-sql-001): State transfer to 0 (dev-sql-002) failed: -1 (Operation not permitted)
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():719: Will never receive state. Need to abort.
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [Note] WSREP: gcomm: terminating thread
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [Note] WSREP: gcomm: joining thread
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [Note] WSREP: gcomm: closing backend
    Oct 22 10:19:32 dev-sql-002 mysqld: WSREP_SST: [ERROR] Killing nc pid  (20131022 10:19:32.707)
    Oct 22 10:19:32 dev-sql-002 mysqld: WSREP_SST: [INFO] Removing the sst_in_progress file (20131022 10:19:32.725)
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'joiner' --address '192.168.3.221' --auth 'root:test' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '13950': 32 (Broken pipe)
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [ERROR] WSREP: SST failed: 32 (Broken pipe)
    Oct 22 10:19:32 dev-sql-002 mysqld: 131022 10:19:32 [ERROR] Aborting
    Oct 22 10:19:32 dev-sql-002 mysqld:
    Oct 22 10:19:33 dev-sql-002 mysqld: 131022 10:19:33 [Note] WSREP: view(view_id(NON_PRIM,abbffcb5-3af2-11e3-a043-5b38dfb689cf,6) memb {
    Oct 22 10:19:33 dev-sql-002 mysqld: #011abbffcb5-3af2-11e3-a043-5b38dfb689cf,
    Oct 22 10:19:33 dev-sql-002 mysqld: } joined {
    Oct 22 10:19:33 dev-sql-002 mysqld: } left {
    Oct 22 10:19:33 dev-sql-002 mysqld: } partitioned {
    Oct 22 10:19:33 dev-sql-002 mysqld: #011f743192f-3aee-11e3-a47b-97626a725dde,
    Oct 22 10:19:33 dev-sql-002 mysqld: })
    Oct 22 10:19:33 dev-sql-002 mysqld: 131022 10:19:33 [Note] WSREP: view((empty))
    Oct 22 10:19:33 dev-sql-002 mysqld: 131022 10:19:33 [Note] WSREP: gcomm: closed
    Oct 22 10:19:33 dev-sql-002 mysqld: 131022 10:19:33 [Note] WSREP: /usr/sbin/mysqld: Terminated.
    Oct 22 10:19:33 dev-sql-002 mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended
    Oct 22 10:19:55 dev-sql-002 /etc/init.d/mysql[14428]: 0 processes alive and '/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf ping' resulted in
    Oct 22 10:19:55 dev-sql-002 /etc/init.d/mysql[14428]: #007/usr/bin/mysqladmin: connect to server at 'localhost' failed
    Oct 22 10:19:55 dev-sql-002 /etc/init.d/mysql[14428]: error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111)'
    Oct 22 10:19:55 dev-sql-002 /etc/init.d/mysql[14428]: Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
    
  • WSREP status on node1 :

    MariaDB [(none)]> SHOW STATUS LIKE 'wsrep_%'
        -> ;
    +----------------------------+--------------------------------------+
    | Variable_name              | Value |
    +----------------------------+--------------------------------------+
    | wsrep_local_state_uuid     | 2896facc-2110-11e3-0800-717f6d7ec2af |
    | wsrep_protocol_version     | 4 |
    | wsrep_last_committed       | 968 |
    | wsrep_replicated           | 0 |
    | wsrep_replicated_bytes     | 0 |
    | wsrep_received             | 12 |
    | wsrep_received_bytes       | 1126 |
    | wsrep_local_commits        | 0 |
    | wsrep_local_cert_failures  | 0 |
    | wsrep_local_bf_aborts      | 0 |
    | wsrep_local_replays        | 0 |
    | wsrep_local_send_queue     | 0 |
    | wsrep_local_send_queue_avg | 0.500000 |
    | wsrep_local_recv_queue     | 0 |
    | wsrep_local_recv_queue_avg | 0.000000 |
    | wsrep_flow_control_paused  | 0.000000 |
    | wsrep_flow_control_sent    | 0 |
    | wsrep_flow_control_recv    | 0 |
    | wsrep_cert_deps_distance   | 0.000000 |
    | wsrep_apply_oooe           | 0.000000 |
    | wsrep_apply_oool           | 0.000000 |
    | wsrep_apply_window         | 0.000000 |
    | wsrep_commit_oooe          | 0.000000 |
    | wsrep_commit_oool          | 0.000000 |
    | wsrep_commit_window        | 0.000000 |
    | wsrep_local_state          | 4 |
    | wsrep_local_state_comment  | Synced |
    | wsrep_cert_index_size      | 0 |
    | wsrep_causal_reads         | 0 |
    | wsrep_incoming_addresses   | 192.168.3.121:3306 |
    | wsrep_cluster_conf_id      | 5 |
    | wsrep_cluster_size         | 1 |
    | wsrep_cluster_state_uuid   | 2896facc-2110-11e3-0800-717f6d7ec2af |
    | wsrep_cluster_status       | Primary |
    | wsrep_connected            | ON |
    | wsrep_local_index          | 0 |
    | wsrep_provider_name        | Galera |
    | wsrep_provider_vendor      | Codership Oy <[email protected]>    |
    | wsrep_provider_version     | 23.2.7-wheezy(r) |
    | wsrep_ready                | ON |
    +----------------------------+--------------------------------------+
    40 rows in set (0.00 sec)
    
  • Mariadb conf file on node1 :

    [mysqld]
    wsrep_cluster_name = 'dev-sql'
    wsrep_node_name = dev-sql-001
    wsrep_node_address = "192.168.3.121"
    wsrep_cluster_address = 'gcomm://'
    wsrep_retry_autocommit = 0
    wsrep_provider = /usr/lib/galera/libgalera_smm.so
    wsrep_retry_autocommit = 0
    wsrep_sst_method = xtrabackup
    wsrep_sst_auth = root:test
    
    # Other mysqld options
    binlog_format = ROW
    innodb_autoinc_lock_mode = 2
    innodb_flush_log_at_trx_commit = 2
    innodb_locks_unsafe_for_binlog = 1
    
  • Mariadb conf file on node2 :

    [mysqld]
    wsrep_cluster_name = 'dev-sql'
    wsrep_node_name = dev-sql-002
    wsrep_node_address = "192.168.3.221"
    wsrep_cluster_address = 'gcomm://dev-sql-001,dev-sql-002,dev-sql-003'
    wsrep_retry_autocommit = 0
    wsrep_provider = /usr/lib/galera/libgalera_smm.so
    wsrep_sst_method = xtrabackup
    wsrep_sst_auth = root:test
    
    # Other mysqld options
    binlog_format = ROW
    innodb_autoinc_lock_mode = 2
    innodb_flush_log_at_trx_commit = 2
    innodb_locks_unsafe_for_binlog = 1
    

All nodes have the same hosts file with the nodes name and ip in it.

Thanks,

Guillaume

Answer

G-Nugget picture G-Nugget · Jan 16, 2014

The main problem is that the credentials used for the SST are incorrect. It can't access the data so it can't transfer the state to other nodes.

For the point about needing to start the first node with the cluster address of gcomm://, that is needed when there are no running nodes in the cluster to connect to. I'm not sure about the init script on Ubuntu, but I know that the init script on RHEL and CentOS has a bootstrap command that forces gcomm:// as the address for starting a new cluster. Check to see if yours has that and if so, keep the address to connect to a running cluster in the config and use the bootstrap command in the init script to bootstrap the cluster. If yours doesn't have the bootstrap command, you should be able to add the following into the init script to add it:

'bootstrap')
    # Bootstrap the cluster, start the first node
    # that initiate the cluster
    echo $echo_n "Bootstrapping the cluster"
    $0 start $other_args --wsrep-new-cluster
    ;;

That's from my RHEL init script and it might need to be modified slightly, but that's at least generally what it needs.