I've searched google thoroughly for a definitive solution or set of steps to resolve this issue, but there don't seem to be many high quality results, and I haven't found the question on stack overflow. We're trying to set up MySQL replication using one slave. The slave appears to be replicating fine, and then the following error occurs:
Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
In order to benefit the large number of people who will inevitably stumble upon this question from a search, it would be helpful if someone who responds provided an overview of what could be going wrong and what steps to take to resolve this issue, but I will also provide more details below related to my particular situation in hopes that someone can help me solve it.
The dump that we imported into the slave to get it started was created using the following command on the master:
mysqldump --opt --allow-keywords -q -uroot -ppassword dbname > E:\Backups\dbname.sql
The script that performs this backup also logs the master's current binary log position. We then took the following steps to start replication on the slave:
1. STOP SLAVE;
2. DROP DATABASE dbname;
3. SOURCE dbname.sql;
(... waited a few hours for the 10gb dump to import)
4. RESET SLAVE;
5. CHANGE MASTER TO MASTER_HOST='[masterhostname]', MASTER_USER='[slaveusername]', MASTER_PASSWORD='[slaveuserpassword]', MASTER_PORT=[port], MASTER_LOG_FILE='[masterlogfile]', MASTER_LOG_POS=[masterlogposition];
6. START SLAVE;
After about a day of replication working fine, it failed again at 3:43 AM. The first thing that appeared in MySQL's error log was the error above. Then another generic error appeared after with the same timestamp:
Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log '[masterlogfile]' position [masterlogpos]
For more logging information, I had set up a batch script to run "SHOW SLAVE STATUS" and "SHOW FULL PROCESSLIST" every hour. Here are the results before and after the failure:
--Monitoring: 3:00:00.15
Slave Status:
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.xxx.xxx
Master_User: slave_user
Master_Port: xxxx
Connect_Retry: 60
Master_Log_File: mysql-bin.000xxx
Read_Master_Log_Pos: 316611912
Relay_Log_File: dbname-relay-bin.00000x
Relay_Log_Pos: 404287513
Relay_Master_Log_File: mysql-bin.000xxx
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB: dbname
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 316611912
Relay_Log_Space: 404287513
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
*************************** 1. row ***************************
Id: 98
User: system user
Host:
db: NULL
Command: Connect
Time: 60547
State: Waiting for master to send event
Info: NULL
*************************** 2. row ***************************
Id: 99
User: system user
Host:
db: NULL
Command: Connect
Time: 5
State: Has read all relay log; waiting for the slave I/O thread to update it
Info: NULL
*************************** 3. row ***************************
Id: 119
User: root
Host: localhost:xxxx
db: NULL
Command: Query
Time: 0
State: NULL
Info: SHOW FULL PROCESSLIST
--Monitoring: 4:00:02.71
Slave Status:
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.xxx.xxx
Master_User: slave_user
Master_Port: xxxx
Connect_Retry: 60
Master_Log_File: mysql-bin.000xxx
Read_Master_Log_Pos: 324365637
Relay_Log_File: dbname-relay-bin.00000x
Relay_Log_Pos: 410327741
Relay_Master_Log_File: mysql-bin.000xxx
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB: dbname
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Skip_Counter: 0
Exec_Master_Log_Pos: 322652140
Relay_Log_Space: 412041238
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
*************************** 1. row ***************************
Id: 98
User: system user
Host:
db: NULL
Command: Connect
Time: 64149
State: Waiting for master to send event
Info: NULL
*************************** 2. row ***************************
Id: 122
User: root
Host: localhost:3029
db: NULL
Command: Query
Time: 0
State: NULL
Info: SHOW FULL PROCESSLIST
I tried following the instructions from the error and ran mysqlbinlog on the slave's relay log with a start_position thousands of statements before, and stop_position thousands of statements after the point of failure, and redirected the output to a text file. I did not see any corruption errors on the command line or in the log file. This is what the log file said around the point of failure:
...
# at 410327570
#120816 3:43:26 server id 1 log_pos 322651969 Intvar
SET INSERT_ID=3842697;
# at 410327598
#120816 3:43:26 server id 1 log_pos 322651997 Query thread_id=762340 exec_time=0 error_code=0
SET TIMESTAMP=1345113806
insert into LOGTABLENAME (UpdateDate, Description) values (now(), "Invalid floating point operation");
# at 410327741
#120816 3:44:26 server id 1 log_pos 322754486 Intvar
SET INSERT_ID=3842701;
# at 410327769
#120816 3:43:26 server id 1 log_pos 322754514 Query thread_id=762340 exec_time=0 error_code=0
SET TIMESTAMP=1345113866;
insert into LOGTABLENAME (UpdateDate, Description) values (now(), "Invalid floating point operation");
# at 410327912
...
Interesting that it's logging an Invalid floating point operation at that point, but I'm not sure how that could cause replication to break at that position. I ran mysqlbinlog on the master's binary log found in SHOW SLAVE STATUS from above, and did not see any errors on the command line (but did not get a chance to open the 100mb log file that was generated since I didn't want to bog down the production server).
So right now I'm at a loss for what else to try. I'm basically just looking for any insights as to what might be going wrong or any suggestions for what steps to take next. Thanks!
I'm not sure what the root cause may be. But to recover from this situation, you'd want to instruct MySQL to clear out all the relay-bin-logs beyond the following point
by doing the following:
STOP SLAVE; CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000xxx', MASTER_LOG_POS = 322652140; START SLAVE;
NOTE: To readers out there, do not be confused by Relay_Master_Log_File, it is NOT the same as Read_Master_Log_Pos. And do not confuse Exec_Master_Log_Pos with Read_Master_Log_Pos. The Read_* is a read-ahead strategy that MySQL does to download the replication bin logs from the master ahead of the actual implementation of the replication being executed locally.