Q: How do I configure a slave if the master is running and I do not want to stop it?
A: There are several
possibilities. If you have taken a snapshot backup of the master
at some point and recorded the binary log filename and offset
(from the output of SHOW MASTER STATUS
)
corresponding to the snapshot, use the following procedure:
Make sure that the slave is assigned a unique server ID.
Execute the following statement on the slave, filling in appropriate values for each option:
mysql>CHANGE MASTER TO
->MASTER_HOST='
->master_host_name
',MASTER_USER='
->master_user_name
',MASTER_PASSWORD='
->master_pass
',MASTER_LOG_FILE='
->recorded_log_file_name
',MASTER_LOG_POS=
recorded_log_position
;
Execute START SLAVE
on the slave.
If you do not have a backup of the master server, here is a quick procedure for creating one. All steps should be performed on the master host.
Issue this statement to acquire a global read lock:
mysql> FLUSH TABLES WITH READ LOCK;
With the lock still in place, execute this command (or a variation of it):
shell> tar zcf /tmp/backup.tar.gz /var/lib/mysql
Issue this statement and record the output, which you will need later:
mysql> SHOW MASTER STATUS;
Release the lock:
mysql> UNLOCK TABLES;
An alternative to using the preceding procedure to make a binary copy is to make an SQL dump of the master. To do this, you can use mysqldump --master-data on your master and later load the SQL dump into your slave. However, this is slower than making a binary copy.
Regardless of which of the two methods you use, afterward follow the instructions for the case when you have a snapshot and have recorded the log filename and offset. You can use the same snapshot to set up several slaves. Once you have the snapshot of the master, you can wait to set up a slave as long as the binary logs of the master are left intact. The two practical limitations on the length of time you can wait are the amount of disk space available to retain binary logs on the master and the length of time it takes the slave to catch up.
You can also use LOAD DATA FROM MASTER
. This is
a convenient statement that transfers a snapshot to the slave and
adjusts the log filename and offset all at once. Be warned,
however, that it works only for MyISAM
tables
and it may hold a read lock for a long time. It is not yet
implemented as efficiently as we would like. If you have large
tables, the preferred method is still to make a binary snapshot on
the master server after executing FLUSH TABLES WITH READ
LOCK
.
Q: Does the slave need to be connected to the master all the time?
A: No, it does not. The slave can go down or stay disconnected for hours or even days, and then reconnect and catch up on updates. For example, you can set up a master/slave relationship over a dial-up link where the link is up only sporadically and for short periods of time. The implication of this is that, at any given time, the slave is not guaranteed to be in synchrony with the master unless you take some special measures.
Q: How do I know how late a slave is compared to the master? In other words, how do I know the date of the last statement replicated by the slave?
A: You can read the
Seconds_Behind_Master
column in SHOW
SLAVE STATUS
. See
Section 6.3, “Replication Implementation Details”.
When the slave SQL thread executes an event read from the master,
it modifies its own time to the event timestamp. (This is why
TIMESTAMP
is well replicated.) In the
Time
column in the output of SHOW
PROCESSLIST
, the number of seconds displayed for the
slave SQL thread is the number of seconds between the timestamp of
the last replicated event and the real time of the slave machine.
You can use this to determine the date of the last replicated
event. Note that if your slave has been disconnected from the
master for one hour, and then reconnects, you may immediately see
Time
values like 3600 for the slave SQL thread
in SHOW PROCESSLIST
. This is because the slave
is executing statements that are one hour old.
Q: How do I force the master to block updates until the slave catches up?
A: Use the following procedure:
On the master, execute these statements:
mysql>FLUSH TABLES WITH READ LOCK;
mysql>SHOW MASTER STATUS;
Record the replication cooredinates (the log filename and
offset) from the output of the SHOW
statement.
On the slave, issue the following statement, where the
arguments to the MASTER_POS_WAIT()
function
are the replication coordinate values obtained in the previous
step:
mysql> SELECT MASTER_POS_WAIT('log_name
', log_offset
);
The SELECT
statement blocks until the slave
reaches the specified log file and offset. At that point, the
slave is in synchrony with the master and the statement
returns.
On the master, issue the following statement to allow the master to begin processing updates again:
mysql> UNLOCK TABLES;
Q: What issues should I be aware of when setting up two-way replication?
A: MySQL replication currently does not support any locking protocol between master and slave to guarantee the atomicity of a distributed (cross-server) update. In other words, it is possible for client A to make an update to co-master 1, and in the meantime, before it propagates to co-master 2, client B could make an update to co-master 2 that makes the update of client A work differently than it did on co-master 1. Thus, when the update of client A makes it to co-master 2, it produces tables that are different from what you have on co-master 1, even after all the updates from co-master 2 have also propagated. This means that you should not chain two servers together in a two-way replication relationship unless you are sure that your updates can safely happen in any order, or unless you take care of mis-ordered updates somehow in the client code.
You should also realize that two-way replication actually does not improve performance very much (if at all) as far as updates are concerned. Each server must do the same number of updates, just as you would have a single server do. The only difference is that there is a little less lock contention, because the updates originating on another server are serialized in one slave thread. Even this benefit might be offset by network delays.
Q: How can I use replication to improve performance of my system?
A: You should set up one server
as the master and direct all writes to it. Then configure as many
slaves as you have the budget and rackspace for, and distribute
the reads among the master and the slaves. You can also start the
slaves with the --skip-innodb
,
--skip-bdb
,
--low-priority-updates
, and
--delay-key-write=ALL
options to get speed
improvements on the slave end. In this case, the slave uses
non-transactional MyISAM
tables instead of
InnoDB
and BDB
tables to get
more speed by eliminating transactional overhead.
Q: What should I do to prepare client code in my own applications to use performance-enhancing replication?
A: If the part of your code that is responsible for database access has been properly abstracted/modularized, converting it to run with a replicated setup should be very smooth and easy. Change the implementation of your database access to send all writes to the master, and to send reads to either the master or a slave. If your code does not have this level of abstraction, setting up a replicated system gives you the opportunity and motivation to it clean up. Start by creating a wrapper library or module that implements the following functions:
safe_writer_connect()
safe_reader_connect()
safe_reader_statement()
safe_writer_statement()
safe_
in each function name means that the
function takes care of handling all error conditions. You can use
different names for the functions. The important thing is to have
a unified interface for connecting for reads, connecting for
writes, doing a read, and doing a write.
Then convert your client code to use the wrapper library. This may be a painful and scary process at first, but it pays off in the long run. All applications that use the approach just described are able to take advantage of a master/slave configuration, even one involving multiple slaves. The code is much easier to maintain, and adding troubleshooting options is trivial. You need modify only one or two functions; for example, to log how long each statement took, or which statement among those issued gave you an error.
If you have written a lot of code, you may want to automate the conversion task by using the replace utility that comes with standard MySQL distributions, or write your own conversion script. Ideally, your code uses consistent programming style conventions. If not, then you are probably better off rewriting it anyway, or at least going through and manually regularizing it to use a consistent style.
Q: When and how much can MySQL replication improve the performance of my system?
A: MySQL replication is most beneficial for a system that processes frequent reads and infrequent writes. In theory, by using a single-master/multiple-slave setup, you can scale the system by adding more slaves until you either run out of network bandwidth, or your update load grows to the point that the master cannot handle it.
To determine how many slaves you can use before the added benefits
begin to level out, and how much you can improve performance of
your site, you need to know your query patterns, and to determine
empirically by benchmarking the relationship between the
throughput for reads (reads per second, or
reads
) and for writes
(writes
) on a typical master and a typical
slave. The example here shows a rather simplified calculation of
what you can get with replication for a hypothetical system.
Let's say that system load consists of 10% writes and 90% reads,
and we have determined by benchmarking that
reads
is 1200 – 2 ×
writes
. In other words, the system can do 1,200
reads per second with no writes, the average write is twice as
slow as the average read, and the relationship is linear. Let us
suppose that the master and each slave have the same capacity, and
that we have one master and N
slaves.
Then we have for each server (master or slave):
reads = 1200 – 2 × writes
reads = 9 × writes / (
(reads are split, but writes go to all servers)
N
+ 1)
9 × writes / (
N
+ 1) + 2
× writes = 1200
writes = 1200 / (2 +
9/(
N
+1))
The last equation indicates the maximum number of writes for
N
slaves, given a maximum possible read
rate of 1,200 per minute and a ratio of nine reads per write.
This analysis yields the following conclusions:
If N
= 0 (which means we have no
replication), our system can handle about 1200/11 = 109 writes
per second.
If N
= 1, we get up to 184 writes
per second.
If N
= 8, we get up to 400 writes
per second.
If N
= 17, we get up to 480 writes
per second.
Eventually, as N
approaches
infinity (and our budget negative infinity), we can get very
close to 600 writes per second, increasing system throughput
about 5.5 times. However, with only eight servers, we increase
it nearly four times.
Note that these computations assume infinite network bandwidth and
neglect several other factors that could be significant on your
system. In many cases, you may not be able to perform a
computation similar to the one just shown that accurately predicts
what will happen on your system if you add
N
replication slaves. However,
answering the following questions should help you decide whether
and by how much replication will improve the performance of your
system:
What is the read/write ratio on your system?
How much more write load can one server handle if you reduce the reads?
For how many slaves do you have bandwidth available on your network?
Q: How can I use replication to provide redundancy or high availability?
A: With the currently available features, you would have to set up a master and a slave (or several slaves), and to write a script that monitors the master to check whether it is up. Then instruct your applications and the slaves to change master in case of failure. Some suggestions:
To tell a slave to change its master, use the CHANGE
MASTER TO
statement.
A good way to keep your applications informed as to the
location of the master is by having a dynamic DNS entry for
the master. With bind
you can use
nsupdate
to dynamically update your DNS.
Run your slaves with the --log-bin
option and
without --log-slave-updates
. In this way, the
slave is ready to become a master as soon as you issue
STOP SLAVE
; RESET
MASTER
, and CHANGE MASTER TO
statement on the other slaves. For example, assume that you
have the following setup:
WC \ v WC----> M / | \ / | \ v v v S1 S2 S3
In this diagram, M
means the master,
S
the slaves, WC
the
clients issuing database writes and reads; clients that issue
only database reads are not represented, because they need not
switch. S1
, S2
, and
S3
are slaves running with
--log-bin
and without
--log-slave-updates
. Because updates received
by a slave from the master are not logged in the binary log
unless --log-slave-updates
is specified, the
binary log on each slave is empty initially. If for some
reason M
becomes unavailable, you can pick
one of the slaves to become the new master. For example, if
you pick S1
, all WC
should be redirected to S1
, which will log
updates to its binary log. S2
and
S3
should then replicate from
S1
.
The reason for running the slave without
--log-slave-updates
is to prevent slaves from
receiving updates twice in case you cause one of the slaves to
become the new master. Suppose that S1
has
--log-slave-updates
enabled. Then it will
write updates that it receives from M
to
its own binary log. When S2
changes from
M
to S1
as its master,
it may receive updates from S1
that it has
already received from M
Make sure that all slaves have processed any statements in
their relay log. On each slave, issue STOP SLAVE
IO_THREAD
, then check the output of SHOW
PROCESSLIST
until you see Has read all
relay log
. When this is true for all slaves, they
can be reconfigured to the new setup. On the slave
S1
being promoted to become the master,
issue STOP SLAVE
and RESET
MASTER
.
On the other slaves S2
and
S3
, use STOP SLAVE
and
CHANGE MASTER TO MASTER_HOST='S1'
(where
'S1'
represents the real hostname of
S1
). To CHANGE MASTER
,
add all information about how to connect to
S1
from S2
or
S3
(user
,
password
,
port
). In CHANGE
MASTER
, there is no need to specify the name of
S1
's binary log or binary log position to
read from: We know it is the first binary log and position 4,
which are the defaults for CHANGE MASTER
.
Finally, use START SLAVE
on
S2
and S3
.
Then instruct all WC
to direct their
statements to S1
. From that point on, all
updates statements sent by WC
to
S1
are written to the binary log of
S1
, which then contains every update
statement sent to S1
since
M
died.
The result is this configuration:
WC / | WC | M(unavailable) \ | \ | v v S1<--S2 S3 ^ | +-------+
When M
is up again, you must issue on it
the same CHANGE MASTER
as that issued on
S2
and S3
, so that
M
becomes a slave of S1
and picks up all the WC
writes that it
missed while it was down. To make M
a
master again (because it is the most powerful machine, for
example), use the preceding procedure as if
S1
was unavailable and M
was to be the new master. During this procedure, do not forget
to run RESET MASTER
on M
before making S1
, S2
,
and S3
slaves of M
.
Otherwise, they may pick up old WC
writes
from before the point at which M
became
unavailable.
Note that there is no synchronization between the different slaves to a master. Some slaves might be ahead of others. This means that the concept outlined in the previous example might not work. In practice, however, the relay logs of different slaves will most likely not be far behind the master, so it would work, anyway (but there is no guarantee).
Q: How do I prevent GRANT and REVOKE statements from replicating to slave machines?
A: Start the server with the
--replicate-wild-ignore-table=mysql.%
option.
Q: Does replication work on mixed operating systems (for example, the master runs on Linux while slaves run on Mac OS X and Windows)?
A: Yes.
Q: Does replication work on mixed hardware architectures (for example, the master runs on a 64-bit machine while slaves run on 32-bit machines)?
A: Yes.