15.8. On-line Backup of MySQL Cluster

15.8.1. Cluster Backup Concepts

A backup is a snapshot of the database at a given time. The backup consists of three main parts:

Metadata: the names and definitions of all database tables
Table records: the data actually stored in the database tables at the time that the backup was made
Transaction log: a sequential record telling how and when data was stored in the database

Each of these parts is saved on all nodes participating in the backup. During backup, each node saves these three parts into three files on disk:

BACKUP-backup_id.node_id.ctl
A control file containing control information and metadata. Each node saves the same table definitions (for all tables in the cluster) to its own version of this file.
BACKUP-backup_id-0.node_id.data
A data file containing the table records, which are saved on a per-fragment basis. That is, different nodes save different fragments during the backup. The file saved by each node starts with a header that states the tables to which the records belong. Following the list of records there is a footer containing a checksum for all records.
BACKUP-backup_id.node_id.log
A log file containing records of committed transactions. Only transactions on tables stored in the backup are stored in the log. Nodes involved in the backup save different records because different nodes host different database fragments.

In the listing above, backup_id stands for the backup identifier and node_id is the unique identifier for the node creating the file.

15.8.2. Using The Management Client to Create a Backup

Before starting a backup, make sure that the cluster is properly configured for performing one. (See Section 15.8.4, “Configuration for Cluster Backup”.)

Creating a backup using the management client involves the following steps:

Start the management client (ndb_mgm).
Execute the command START BACKUP.
The management client will reply with the message Start of backup ordered. This means that the management client has submitted the request to the cluster, but has not yet received any response.
The management client will reply Backup backup_id started, where backup_id is the unique identifier for this particular backup. (This identifier will also be saved in the cluster log, if it has not been configured otherwise.) This means that the cluster has received and processed the backup request. It does not mean that the backup has finished.
The management client will signal that the backup is finished with the message Backup backup_id completed.

To abort a backup that is already in progress:

Start the management client.
Execute the command ABORT BACKUP backup_id. The number backup_id is the identifier of the backup that was included in the response of the management client when the backup was started (in the message Backup backup_id started).
The management client will acknowledge the abort request with Abort of backup backup_id ordered; note that it has received no actual response to this request yet.
After the backup has been aborted, the management client will report Backup backup_id has been aborted for reason XYZ. This means that the cluster has terminated the backup and that all files related to this backup have been removed from the cluster filesystem.

It is also possible to abort a backup in progress from the system shell using this command:

shell> ndb_mgm -e "ABORT BACKUP backup_id"

Note: If there is no backup with ID backup_id running when it is aborted, the management client makes no explicit response. However, the fact that an invalid abort command was sent is indicated in the cluster log.

15.8.3. How to Restore a Cluster Backup

The cluster restoration program is implemented as a separate command-line utility ndb_restore, which reads the files created by the backup and inserts the stored information into the database. The restore program must be executed once for each set of backup files. That is, as many times as there were database nodes running when the backup was created.

The first time you run the ndb_restore restoration program, you also need to restore the metadata. In other words, you must re-create the database tables. (Note that the cluster should have an empty database when starting to restore a backup.) The restore program acts as an API to the cluster and therefore requires a free connection to connect to the cluster. This can be verified with the ndb_mgm command SHOW (you can accomplish this from a system shell using ndb_mgm -e SHOW). The -c connectstring option may be used to locate the MGM node (see Section 15.4.4.2, “The MySQL Cluster connectstring”, for information on connectstrings). The backup files must be present in the directory given as an argument to the restoration program.

It is possible to restore a backup to a database with a different configuration than it was created from. For example, suppose that a backup with backup ID 12, created in a cluster with two database nodes having the node IDs 2 and 3, is to be restored to a cluster with four nodes. Then ndb_restore must be run twice — once for each database node in the cluster where the backup was taken. However, ndb_restore cannot always restore backups made from a cluster running one version of MySQL to a cluster running a different MySQL version. See Section 15.5.2, “Cluster Upgrade and Downgrade Compatibility”, for more information.

Note: For rapid restoration, the data may be restored in parallel, provided that there is a sufficient number of cluster connections available. However, the data files must always be applied before the logs.

15.8.4. Configuration for Cluster Backup

Four configuration parameters are essential for backup:

BackupDataBufferSize
The amount of memory used to buffer data before it is written to disk.
BackupLogBufferSize
The amount of memory used to buffer log records before these are written to disk.
BackupMemory
The total memory allocated in a database node for backups. This should be the sum of the memory allocated for the backup data buffer and the backup log buffer.
BackupWriteSize
The size of blocks written to disk. This applies for both the backup data buffer and the backup log buffer.

More detailed information about these parameters can be found in Section 15.4.4.5, “Defining MySQL Cluster Data Nodes”.

15.8.5. Backup Troubleshooting

If an error code is returned when issuing a backup request, the most likely cause is insufficient memory or insufficient disk space. You should check that there is enough memory allocated for the backup. Also check that there is enough space on the hard drive partition of the backup target.

NDB does not support repeatable reads, which can cause problems with the restoration process. Although the backup process is “hot”, restoring a MySQL Cluster from backup is not a 100% “hot” process. This is due to the fact that, for the duration of the restore process, running transactions get non-repeatable reads from the restored data. This means that the state of the data is inconsistent while the restore is in progress.