一套Master-Master Replication的MySQL集群,版本5.1.37。其中一个节点A出现OS异常重启,数据库启动后表现正常。但是没过多久另外一个节点B报错:
091127 21:50:21 [ERROR] Error reading packet from server:
Client requested master to start replication from
impossible position ( server_errno=1236) 091127 21:50:21 [ERROR]
Got fatal error 1236: 'Client requested master to start
replication from impossible position' from master
when reading data from binary log 091127 21:50:21
[Note] Slave I/O thread exiting, read up to log
'MySQL-bin.000535', position 193022771
Slave_IO_Running线程终止。仔细看上面的报错信息,说slave进程试图从MySQL-bin.000535日志的position 193022771开始启动恢复,但是该日志中是没有这个position的。
跑到A上通过MySQLbinlog查看该日志,发现最后一个有效position是193009460。而要求的193022771已经大于最后有效的position了。这个原因就搞不明白了,难道是因为A库异常关闭后导致A节点的binlog没有来得及刷到磁盘,而B节点slave已经恢复到前面去了?
$MySQLbinlog MySQL-bin.000535 > 1.txt $tail -n 7 1.txt # at 193009460 #091127 20:50:21 server id 1 end_log_pos 193009487 Xid = 194299849 COMMIT/*!*/; DELIMITER ; # End of log file ROLLBACK /* added by MySQLbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
尝试将B节点change master到最后一个有效的position处,问题暂时得到解决:
change master to master_log_file='MySQL-bin.000535', master_log_pos=193009460
网上搜索了一把,发现logzgh之前也碰到过同样的问题,版本是5.0.51。