Check slave DB servers
Based on Maatkit software: http://www.maatkit.org/
Setup before you begin
Problem: You have a Slave with wrong data or missing data.
Solution: You force resynchronizing the Master to the Slave. This is not a built-in feature of MySQL, but Maatkit does what you need.
Prerequisits:
Download Maatkit from http://code.google.com/p/maatkit/downloads/list and install it.
You need a table which will store CRC checks of other tables. Create it on the Master, which will be replicated to the Slaves. Our CRC table here will be 000_checksum. Stored in database MyDatabase, as this is the replicated database here.
If the table doesn't exist, create it on the Master with:
CREATE TABLE 000_checksum (
db char(64) NOT NULL,
tbl char(64) NOT NULL,
chunk int NOT NULL,
boundaries char(100) NOT NULL,
this_crc char(40) NOT NULL,
this_cnt int NOT NULL,
master_crc char(40) NULL,
master_cnt int NULL,
ts timestamp NOT NULL,
PRIMARY KEY (db, tbl, chunk)
);
Once you have the needed stuff, you can start the work. We have 2 steps:
1.Start the CRC checks on the Master, the Master will fill the 000_checksum table with his own CRCs, of course they match. The Slaves will also fill the 000_checksum table with their own CRCs, these differ in case of bad data on the Slave.
2.Once the CRCs are available, go to each Slave to check the CRCs. If they differ, run on this Slave the command to start replicating the wrong data to good data. Even if you have too much data on a Slave (which shouldn't be the case), this bad data will be deleted!
Find the problems on a Slave
On the Master DB server (we check the table “MyTable” here):
fred@db01:~$ mk-table-checksum --replicate=MyDatabase.000_checksum --databases MyDatabase --tables MyTable --chunk-size 300000 h=localhost,u=root
On the a Slave DB server:
fred@db02:~$ mysql -u root MyDatabase
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 546695 to server version: 5.0.22-Debian_0ubuntu6.06.5-log
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
1:SLAVE root@localhost [MyDatabase]>select * from 000_checksum;
+-------+-----+-------+------------------------------------+----------+----------+------------+------------+---------------------+
| db | tbl | chunk | boundaries | this_crc | this_cnt | master_crc | master_cnt | ts |
+-------+-----+-------+------------------------------------+----------+----------+------------+------------+---------------------+
| MyDatabase | MyTable | 0 | `id` < 413945 | d2251c68 | 234805 | d2251c68 | 234805 | 2009-07-08 12:23:22 |
[...]
+-------+-----+-------+------------------------------------+----------+----------+------------+------------+---------------------+
17 rows in set (0.00 sec)
You see that this Slave has not the same data as his Master. Column this_crc is the CRC of the Slave, master_crc is the CRC of ths Slave's Master. Here you have a row in 000_checksum per +-300000 data rows in MyTable because we started the checking command with the “--chunk-size 300000” option.
Fix the data problems on a Slave
Run on the bad Slave
fred@db02:~$ mk-table-sync --print --sync-to-master --replicate=MyDatabase.000_checksum h=db01,u=MyLogin,p=MyPass,D=MyDatabase,t=MyTable h=localhost,u=root,p=
And you will see the queries flowing on your screen that will fix the bad data on the Slave. Once you've checked Maatkit will do the right thing (check if indeed data is not the same as on the Master, if the data is missing on the Slave, etc), run the same command but without --print. This will execute the queries …