MySQL Replication fails with error "Could not parse relay log event entry."

nolliecrook picture nolliecrook · Aug 23, 2012 · Viewed 12.9k times · Source

I've searched google thoroughly for a definitive solution or set of steps to resolve this issue, but there don't seem to be many high quality results, and I haven't found the question on stack overflow. We're trying to set up MySQL replication using one slave. The slave appears to be replicating fine, and then the following error occurs:

Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

In order to benefit the large number of people who will inevitably stumble upon this question from a search, it would be helpful if someone who responds provided an overview of what could be going wrong and what steps to take to resolve this issue, but I will also provide more details below related to my particular situation in hopes that someone can help me solve it.


The dump that we imported into the slave to get it started was created using the following command on the master:

mysqldump --opt --allow-keywords -q -uroot -ppassword dbname > E:\Backups\dbname.sql

The script that performs this backup also logs the master's current binary log position. We then took the following steps to start replication on the slave:

1. STOP SLAVE;
2. DROP DATABASE dbname;
3. SOURCE dbname.sql;
    (... waited a few hours for the 10gb dump to import)
4. RESET SLAVE;
5. CHANGE MASTER TO MASTER_HOST='[masterhostname]', MASTER_USER='[slaveusername]', MASTER_PASSWORD='[slaveuserpassword]', MASTER_PORT=[port], MASTER_LOG_FILE='[masterlogfile]', MASTER_LOG_POS=[masterlogposition];
6. START SLAVE;

After about a day of replication working fine, it failed again at 3:43 AM. The first thing that appeared in MySQL's error log was the error above. Then another generic error appeared after with the same timestamp:

Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log '[masterlogfile]' position [masterlogpos]

For more logging information, I had set up a batch script to run "SHOW SLAVE STATUS" and "SHOW FULL PROCESSLIST" every hour. Here are the results before and after the failure:

--Monitoring: 3:00:00.15 

Slave Status: 
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 192.168.xxx.xxx
                Master_User: slave_user
                Master_Port: xxxx
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000xxx
        Read_Master_Log_Pos: 316611912
             Relay_Log_File: dbname-relay-bin.00000x
              Relay_Log_Pos: 404287513
      Relay_Master_Log_File: mysql-bin.000xxx
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB: dbname
        Replicate_Ignore_DB: 
         Replicate_Do_Table: 
     Replicate_Ignore_Table: 
    Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
                 Last_Errno: 0
                 Last_Error: 
               Skip_Counter: 0
        Exec_Master_Log_Pos: 316611912
            Relay_Log_Space: 404287513
            Until_Condition: None
             Until_Log_File: 
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File: 
         Master_SSL_CA_Path: 
            Master_SSL_Cert: 
          Master_SSL_Cipher: 
             Master_SSL_Key: 
      Seconds_Behind_Master: 0

*************************** 1. row ***************************
     Id: 98
   User: system user
   Host: 
     db: NULL
Command: Connect
   Time: 60547
  State: Waiting for master to send event
   Info: NULL
*************************** 2. row ***************************
     Id: 99
   User: system user
   Host: 
     db: NULL
Command: Connect
   Time: 5
  State: Has read all relay log; waiting for the slave I/O thread to update it
   Info: NULL
*************************** 3. row ***************************
     Id: 119
   User: root
   Host: localhost:xxxx
     db: NULL
Command: Query
   Time: 0
  State: NULL
   Info: SHOW FULL PROCESSLIST

--Monitoring: 4:00:02.71 

Slave Status: 
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 192.168.xxx.xxx
                Master_User: slave_user
                Master_Port: xxxx
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000xxx
        Read_Master_Log_Pos: 324365637
             Relay_Log_File: dbname-relay-bin.00000x
              Relay_Log_Pos: 410327741
      Relay_Master_Log_File: mysql-bin.000xxx
           Slave_IO_Running: Yes
          Slave_SQL_Running: No
            Replicate_Do_DB: dbname
        Replicate_Ignore_DB: 
         Replicate_Do_Table: 
     Replicate_Ignore_Table: 
    Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
                 Last_Errno: 0
                 Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
               Skip_Counter: 0
        Exec_Master_Log_Pos: 322652140
            Relay_Log_Space: 412041238
            Until_Condition: None
             Until_Log_File: 
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File: 
         Master_SSL_CA_Path: 
            Master_SSL_Cert: 
          Master_SSL_Cipher: 
             Master_SSL_Key: 
      Seconds_Behind_Master: NULL

*************************** 1. row ***************************
     Id: 98
   User: system user
   Host: 
     db: NULL
Command: Connect
   Time: 64149
  State: Waiting for master to send event
   Info: NULL
*************************** 2. row ***************************
     Id: 122
   User: root
   Host: localhost:3029
     db: NULL
Command: Query
   Time: 0
  State: NULL
   Info: SHOW FULL PROCESSLIST

I tried following the instructions from the error and ran mysqlbinlog on the slave's relay log with a start_position thousands of statements before, and stop_position thousands of statements after the point of failure, and redirected the output to a text file. I did not see any corruption errors on the command line or in the log file. This is what the log file said around the point of failure:

...
# at 410327570
#120816 3:43:26 server id 1 log_pos 322651969    Intvar
SET INSERT_ID=3842697;
# at 410327598
#120816 3:43:26 server id 1 log_pos 322651997    Query    thread_id=762340    exec_time=0   error_code=0
SET TIMESTAMP=1345113806
insert into LOGTABLENAME (UpdateDate, Description) values (now(), "Invalid floating point operation");
# at 410327741
#120816 3:44:26 server id 1 log_pos 322754486    Intvar
SET INSERT_ID=3842701;
# at 410327769
#120816 3:43:26 server id 1 log_pos 322754514    Query    thread_id=762340    exec_time=0   error_code=0
SET TIMESTAMP=1345113866;
insert into LOGTABLENAME (UpdateDate, Description) values (now(), "Invalid floating point operation");
# at 410327912
...

Interesting that it's logging an Invalid floating point operation at that point, but I'm not sure how that could cause replication to break at that position. I ran mysqlbinlog on the master's binary log found in SHOW SLAVE STATUS from above, and did not see any errors on the command line (but did not get a chance to open the 100mb log file that was generated since I didn't want to bog down the production server).

So right now I'm at a loss for what else to try. I'm basically just looking for any insights as to what might be going wrong or any suggestions for what steps to take next. Thanks!

Answer

Wood Guardian picture Wood Guardian · Jan 21, 2013

I'm not sure what the root cause may be. But to recover from this situation, you'd want to instruct MySQL to clear out all the relay-bin-logs beyond the following point

  • Relay_Master_Log_File: mysql-bin.000xxx
  • Exec_Master_Log_Pos: 322652140

by doing the following:

STOP SLAVE; CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000xxx', MASTER_LOG_POS = 322652140; START SLAVE;

NOTE: To readers out there, do not be confused by Relay_Master_Log_File, it is NOT the same as Read_Master_Log_Pos. And do not confuse Exec_Master_Log_Pos with Read_Master_Log_Pos. The Read_* is a read-ahead strategy that MySQL does to download the replication bin logs from the master ahead of the actual implementation of the replication being executed locally.