最近MySQL经常报主从错误,如果是网络问题,应该是间断性的,但是今天早上一直在报错。上去看了一下log,发现一直在报这个错误:
150527 11:30:01 [Note] Slave I/O thread killed while connecting to master
150527 11:30:01 [Note] Slave I/O thread exiting, read up to log 'master-bin.000052', position 308678027
150527 11:30:01 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000052' at position 308678027, relay log './copyer1-relay-bin.002681' position: 5382766
150527 11:30:02 [ERROR] Slave I/O: error connecting to master 'copyer@10.242.73.34:3306' - retry-time: 60 retries: 86400, Error_code: 1129
在命令行手动执行远程连接:
[xxx@xxx-xxx-xxx.xxx mysql-5.1.73-linux-x86_64-glibc23]$ bin/mysql -h10.242.73.34 -ucoyper -p
Enter password:
ERROR 1129 (HY000): Host 'xxxxx' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'
这下子对1129错误码有了更明确的错误信息和指示。谷歌一下,根据官方文档B.5.2.6 Host ‘host_name’ is blocked 进行相应的操作:
登陆到master,执行:
mysql> flush hosts;
Query OK, 0 rows affected (0.00 sec)
slave立马就可以连接成功了。
再把允许的错误连接调大:
mysql> SET GLOBAL max_connect_errors=10000;
Query OK, 0 rows affected (0.00 sec)
固化在配置文件中: [root@xxxx ~]# vim /etc/my.cnf max_connect_errors = 10000
但是正如官方文档所说的,这个错误一般是因为网络原因,但是如果是网络原因,调整这个参数其实对于主从同步没有什么帮助。。
If you get the Host ‘host_name’ is blocked error message for a given host, you should first verify that there is nothing wrong with TCP/IP connections from that host. If you are having network problems, it does you no good to increase the value of the max_connect_errors variable.
但是同步问题还是一直存在,统计了一下,发现主要是如下错误码
[xxx@xxx log]$ grep 'Error_code:' mysqld.log| awk '{print $NF}'|sort | uniq -c | sort -n
1 2003
2 1159
17 1129
83 2013
对照官方错误码文档Appendix B Errors, Error Codes, and Common Problems,主要是如下错误:
Error: 1129 SQLSTATE: HY000 (ER_HOST_IS_BLOCKED) Message: Host ‘%s’ is blocked because of many connection errors; unblock with ‘mysqladmin flush-hosts’
Error: 1159 SQLSTATE: 08S01 (ER_NET_READ_INTERRUPTED) Message: Got timeout reading communication packets
Error: 2003 (CR_CONN_HOST_ERROR) Message: Can’t connect to MySQL server on ‘%s’ (%d)
Error: 2013 (CR_SERVER_LOST) Message: Lost connection to MySQL server during query
一看很明显,都是网络问题,1129前面已经说过了。现在看报的最多的2013。同样在官方文档上也有相应的说明:
建议是调整timeout相关的参数看看5.1.4 Server System Variables:
- connect_timeout
- net_read_timeout
- net_write_timeout
- slave_net_timeout
- wait_timeout
使用show variables;
可以看到现在的值:
- connect_timeout = 10
- interactive_timeout = 28800
-
net_read_timeout = 30 - net_retry_count = 10
- net_write_timeout = 60
- slave_net_timeout = 3600
- wait_timeout = 28800
可以用SET语句进行动态修改:
mysql> SET GLOBAL net_read_timeout=60;
mysql> SET GLOBAL net_write_timeout=120;
改了一下,貌似好一些,但是还是会不时报错。。网络问题,没有太好的办法:(