There are two DBs in cluster. Master (db3.example.com):
[mysqld]
wsrep_cluster_address = 'gcomm://'
wsrep_provider = /usr/lib/galera/libgalera_smm.so
wsrep_retry_autocommit = 0
wsrep_sst_method = rsync
wsrep_node_address = 12.79.29.209
And slave (db4.example.com):
[mysqld]
wsrep_cluster_address='gcomm://db3.example.com'
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_sst_method=rsync
wsrep_sst_receive_address=12.79.29.206
wsrep_node_address = 12.79.29.206
The slave won't connect to the master. Log below.
130821 20:12:04 [Warning] option 'table_cache': unsigned value 2097152 adjusted to 524288
130821 20:12:04 [Warning] The syntax '--log-slow-queries' is deprecated and will be removed in a future release. Please use '--slow-query-log'/'--slow-query-log-file' instead.
130821 20:12:04 [Note] WSREP: Read nil XID from storage engines, skipping position init
130821 20:12:04 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
130821 20:12:04 [Note] WSREP: wsrep_load(): Galera 23.2.4(r147) by Codership Oy <[email protected]> loaded succesfully.
130821 20:12:04 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
130821 20:12:04 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
130821 20:12:04 [Note] WSREP: Passing config to GCS: base_host = 12.79.29.206; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
130821 20:12:04 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
130821 20:12:04 [Note] WSREP: wsrep_sst_grab()
130821 20:12:04 [Note] WSREP: Start replication
130821 20:12:04 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
130821 20:12:04 [Note] WSREP: protonet asio version 0
130821 20:12:04 [Note] WSREP: backend: asio
130821 20:12:04 [Note] WSREP: GMCast version 0
130821 20:12:04 [Note] WSREP: (2ecb4dfa-0a8d-11e3-0800-d1d310d8d2ff, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
130821 20:12:04 [Note] WSREP: (2ecb4dfa-0a8d-11e3-0800-d1d310d8d2ff, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
130821 20:12:04 [Note] WSREP: EVS version 0
130821 20:12:04 [Note] WSREP: PC version 0
130821 20:12:04 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer 'db3.example.com:'
130821 20:12:04 [Note] WSREP: declaring 1694931b-0a8b-11e3-0800-11c89939f034 stable
130821 20:12:04 [Note] WSREP: Node 1694931b-0a8b-11e3-0800-11c89939f034 state prim
130821 20:12:04 [Note] WSREP: view(view_id(PRIM,1694931b-0a8b-11e3-0800-11c89939f034,4) memb {
1694931b-0a8b-11e3-0800-11c89939f034,
2ecb4dfa-0a8d-11e3-0800-d1d310d8d2ff,
} joined {
} left {
} partitioned {
})
130821 20:12:05 [Note] WSREP: gcomm: connected
130821 20:12:05 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
130821 20:12:05 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
130821 20:12:05 [Note] WSREP: Opened channel 'my_wsrep_cluster'
130821 20:12:05 [Note] WSREP: Waiting for SST to complete.
130821 20:12:05 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
130821 20:12:05 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
130821 20:12:05 [Note] WSREP: STATE EXCHANGE: sent state msg: 2f15888a-0a8d-11e3-0800-6af8c4db9fd1
130821 20:12:05 [Note] WSREP: STATE EXCHANGE: got state msg: 2f15888a-0a8d-11e3-0800-6af8c4db9fd1 from 0 (db3.example.com)
130821 20:12:05 [Note] WSREP: STATE EXCHANGE: got state msg: 2f15888a-0a8d-11e3-0800-6af8c4db9fd1 from 1 (db4.example.com)
130821 20:12:05 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 3,
members = 1/2 (joined/total),
act_id = 44,
last_appl. = -1,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 16a07f81-0a8b-11e3-0800-cbc8be9154ae
130821 20:12:05 [Note] WSREP: Flow-control interval: [23, 23]
130821 20:12:05 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 44)
130821 20:12:05 [Note] WSREP: State transfer required:
Group state: 16a07f81-0a8b-11e3-0800-cbc8be9154ae:44
Local state: 00000000-0000-0000-0000-000000000000:-1
130821 20:12:05 [Note] WSREP: New cluster view: global state: 16a07f81-0a8b-11e3-0800-cbc8be9154ae:44, view# 4: Primary, number of nodes: 2, my index: 1, protocol version 2
130821 20:12:05 [Warning] WSREP: Gap in state sequence. Need state transfer.
130821 20:12:07 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '12.79.29.206' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '4643''
cat: /var/lib/mysql//rsync_sst.pid: No such file or directory
130821 20:12:07 [Note] WSREP: Prepared SST request: rsync|12.79.29.206:4444/rsync_sst
130821 20:12:07 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130821 20:12:07 [Note] WSREP: Assign initial position for certification: 44, protocol version: 2
130821 20:12:07 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (16a07f81-0a8b-11e3-0800-cbc8be9154ae): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():442. IST will be unavailable.
130821 20:12:07 [Note] WSREP: Node 1 (db4.example.com) requested state transfer from '*any*'. Selected 0 (db3.example.com)(SYNCED) as donor.
130821 20:12:07 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 44)
130821 20:12:07 [Note] WSREP: Requesting state transfer: success, donor: 0
130821 20:13:10 [Warning] WSREP: 0 (db3.example.com): State transfer to 1 (db4.example.com) failed: -1 (Operation not permitted)
130821 20:13:10 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():719: Will never receive state. Need to abort.
130821 20:13:10 [Note] WSREP: gcomm: terminating thread
130821 20:13:10 [Note] WSREP: gcomm: joining thread
130821 20:13:10 [Note] WSREP: gcomm: closing backend
130821 20:13:11 [Note] WSREP: view(view_id(NON_PRIM,1694931b-0a8b-11e3-0800-11c89939f034,4) memb {
2ecb4dfa-0a8d-11e3-0800-d1d310d8d2ff,
} joined {
} left {
} partitioned {
1694931b-0a8b-11e3-0800-11c89939f034,
})
130821 20:13:11 [Note] WSREP: view((empty))
130821 20:13:11 [Note] WSREP: gcomm: closed
130821 20:13:11 [Note] WSREP: mysqld: Terminated.
Aborted
root@db4:~# Parent mysqld process (PID:4643) terminated unexpectedly.
/usr//bin/wsrep_sst_common: line 94: /dev/stderr: Permission denied
After endless googling (with no result), we found by chance the problem was that port 4444 used by rsync replication was closed between the servers. After opening the port on firewall (accept everything from each other) everything works fine.
The message was not very descriptive. So if you end up with:
[Warning] WSREP: 0 (db3.example.com): State transfer to 1 (db4.example.com) failed: -1 (Operation not permitted)
It could mean that your replication settings are good, but the connection itself is not working.
EDIT
It could also be a problem where the rsync daemon does not stop properly and is not detected by mariadb as running. It needs to be killed manually to free up the 4444 port.