Backup and Restore

The following section describe backup and restoration for a mongod instance.

Backup and Restore with Filesystem Snapshots – This section describes a procedure for creating backups of MongoDB systems using system-level tools, such as LVM or storage appliance, as well as the corresponding restoration strategies. These filesystem snapshots, or “block-level” backup methods use system level tools to create copies of the device that holds MongoDB’s data files. These methods complete quickly and work reliably, but require more system configuration outside of MongoDB.

Snapshots Overview – Snapshots work by creating pointers between the live data and a special snapshot volume. These pointers are theoretically equivalent to “hard links.” As the working data diverges from the snapshot, the snapshot process uses a copy-on-write strategy. As a result the snapshot only stores modified data. After making the snapshot, you mount the snapshot image on your file system and copy data from the snapshot. The resulting backup contains a full copy of all data. Snapshots have the following limitations

The database must be valid when the snapshot takes place. This means that all writes accepted by the database need to be fully written to disk: either to the journal or to data files. If all writes are not on disk when the backup occurs, the backup will not reflect these changes. If writes are in progress when the backup occurs, the data files will reflect an inconsistent state. With journaling all data-file states resulting from in-progress writes are recoverable; without journaling you must flush all pending writes to disk before running the backup operation and must ensure that no writes occur during the entire backup procedure. If you do use journaling, the journal must reside on the same volume as the data.
Snapshots create an image of an entire disk image. Unless you need to back up your entire system, consider isolating your MongoDB data files, journal (if applicable), and configuration on one logical disk that doesn’t contain any other data. Alternately, store all MongoDB data files on a dedicated device so that you can make backups without duplicating extraneous data.
Ensure that you copy data from snapshots and onto other systems to ensure that data is safe from site failures.
Although different snapshots methods provide different capability, the LVM method outlined below does not provide any capacity for capturing incremental backups.

Snapshots With Journaling – If your mongod instance has journaling enabled, then you can use any kind of file system or volume/block level snapshot tool to create backups. If you manage your own infrastructure on a Linux-based system, configure your system with LVM to provide your disk packages and provide snapshot capability. You can also use LVM-based setups within a cloud/virtualized environment. Running LVM provides additional flexibility and enables the possibility of using snapshots to back up MongoDB.

Snapshots with Amazon EBS in a RAID 10 Configuration – If your deployment depends on Amazon’s Elastic Block Storage (EBS) with RAID configured within your instance, it is impossible to get a consistent state across all disks using the platform’s snapshot tool. As an alternative, you can do one of the following:

Flush all writes to disk and create a write lock to ensure consistent state during the backup process.
Configure LVM to run and hold your MongoDB data files on top of the RAID within your system. If you choose this option, perform the LVM backup operation described in Create a Snapshot.

Backup and Restore Using LVM on a Linux System – This section provides an overview of a simple backup process using LVM on a Linux system. While the tools, commands, and paths may be (slightly) different on your system the following steps provide a high level overview of the backup operation. Only use the following procedure as a guideline for a backup system and infrastructure. Production backup systems must consider a number of application specific requirements and factors unique to specific environments.

Create a Snapshot – To create a snapshot with LVM, issue a command as root in the following format:

lvcreate –size 100M –snapshot –name mdb-snap01 /dev/vg0/mongodb

This command creates an LVM snapshot (with the –snapshot option) named mdb-snap01 of the mongodb volume in the vg0 volume group. This example creates a snapshot named mdb-snap01 located at /dev/vg0/mdb-snap01. The location and paths to your systems volume groups and devices may vary slightly depending on your operating system’s LVM configuration. The snapshot has a cap of at 100 megabytes, because of the parameter –size 100M. This size does not reflect the total amount of the data on the disk, but rather the quantity of differences between the current state of /dev/vg0/mongodb and the creation of the snapshot (i.e. /dev/vg0/mdb-snap01.)

Ensure that you create snapshots with enough space to account for data growth, particularly for the period of time that it takes to copy data out of the system or to a temporary image. If your snapshot runs out of space, the snapshot image becomes unusable. Discard this logical volume and create another.

The snapshot will exist when the command returns. You can restore directly from the snapshot at any time or by creating a new logical volume and restoring from this snapshot to the alternate image. While snapshots are great for creating high quality backups very quickly, they are not ideal as a format for storing backup data. Snapshots typically depend and reside on the same storage infrastructure as the original disk images. Therefore, it’s crucial that you archive these snapshots and store them elsewhere.

Archive a Snapshot – After creating a snapshot, mount the snapshot and move the data to separate storage. Your system might try to compress the backup images as you move the offline. The following procedure fully archives the data from the snapshot:

umount /dev/vg0/mdb-snap01

dd if=/dev/vg0/mdb-snap01 | gzip > mdb-snap01.gz

The above command sequence does the following:

Ensures that the /dev/vg0/mdb-snap01 device is not mounted.
Performs a block level copy of the entire snapshot image using the dd command and compresses the result in a gzipped file in the current working directory.

This command will create a large gz file in your current working directory. Make sure that you run this command in a file system that has enough free space.

Restore a Snapshot – To restore a snapshot created with the above method, issue the following sequence of commands:

lvcreate –size 1G –name mdb-new vg0

gzip -d -c mdb-snap01.gz | dd of=/dev/vg0/mdb-new

mount /dev/vg0/mdb-new /srv/mongodb

The above sequence does the following:

Creates a new logical volume named mdb-new, in the /dev/vg0 volume group. The path to the new device will be /dev/vg0/mdb-new. This volume will have a maximum size of 1 gigabyte. The original file system must have had a total size of 1 gigabyte or smaller, or else the restoration will fail. Change 1G to your desired volume size.
Uncompresses and unarchives the mdb-snap01.gz into the mdb-new disk image.
Mounts the mdb-new disk image to the /srv/mongodb directory. Modify the mount point to correspond to your MongoDB data file location, or other location as needed.

The restored snapshot will have a stale mongod.lock file. If you do not remove this file from the snapshot, and MongoDB may assume that the stale lock file indicates an unclean shutdown. If you’re running with storage.journal.enabled enabled, and you do not use db.fsyncLock(), you do not need to remove the mongod.lock file. If you use db.fsyncLock() you will need to remove the lock.

Restore Directly from a Snapshot – To restore a backup without writing to a compressed gz file, use the following sequence of commands:

umount /dev/vg0/mdb-snap01

lvcreate –size 1G –name mdb-new vg0

dd if=/dev/vg0/mdb-snap01 of=/dev/vg0/mdb-new

mount /dev/vg0/mdb-new /srv/mongodb

Remote Backup Storage – You can implement off-system backups using the combined process and SSH. This sequence is identical to procedures explained above, except that it archives and compresses the backup on a remote system using SSH. Consider the following procedure:

umount /dev/vg0/mdb-snap01

dd if=/dev/vg0/mdb-snap01 | ssh username@example.com gzip > /opt/backup/mdb-snap01.gz

lvcreate –size 1G –name mdb-new vg0

ssh username@example.com gzip -d -c /opt/backup/mdb-snap01.gz | dd of=/dev/vg0/mdb-new

mount /dev/vg0/mdb-new /srv/mongodb

Create Backups on Instances that do not have Journaling Enabled – If your mongod instance does not run with journaling enabled, or if your journal is on a separate volume, obtaining a functional backup of a consistent state is more complicated. As described in this section, you must flush all writes to disk and lock the database to prevent writes during the backup process. If you have a replica set configuration, then for your backup use a secondary which is not receiving reads (i.e. hidden member).

To flush writes to disk and to “lock” the database (to prevent further writes), issue the db.fsyncLock() method in the mongo shell:

db.fsyncLock();

Perform the backup operation described in Create a Snapshot. To unlock the database after the snapshot has completed, use the following command in the mongo shell:

db.fsyncUnlock();

Changed in version 2.0: MongoDB 2.0 added db.fsyncLock() and db.fsyncUnlock() helpers to the mongo shell. Prior to this version, use the fsync command with the lock option, as follows:

db.runCommand( { fsync: 1, lock: true } );

db.runCommand( { fsync: 1, lock: false } );

The database cannot be locked with db.fsyncLock() while profiling is enabled. You must disable profiling before locking the database with db.fsyncLock(). Disable profiling using db.setProfilingLevel() as follows in the mongo shell:

db.setProfilingLevel(0)

Changed in version 2.2: When used in combination with fsync or db.fsyncLock(), mongod may block some reads, including those from mongodump, when queued write operation waits behind the fsync lock.

Restore a Replica Set from MongoDB Backups – This procedure outlines the process for taking MongoDB data and restoring that data into a new replica set. Use this approach for seeding test deployments from production backups as well as part of disaster recovery. You cannot restore a single data set to three new mongod instances and then create a replica set. In this situation MongoDB will force the secondaries to perform an initial sync. The procedures in this document describe the correct and efficient ways to deploy a replica set.

Restore Database into a Single Node Replica Set –

Obtain backup MongoDB Database files. These files may come from a file system snapshot. The MongoDB Management Service (MMS) produces MongoDB database files for stored snapshots and point and time snapshots. You can also use mongorestore to restore database files using data created with mongodump.
Start a mongod using data files from the backup as the dbpath. In the following example, /data/db is the dbpath to the data files:

mongod –dbpath /data/db

Convert your standalone mongod process to a single node replica set by shutting down the mongod instance, and restarting it with the –replSet option, as in the following example:

mongod –dbpath /data/db –replSet <replName>

Consider explicitly setting a oplogSizeMB to control the size of the oplog created for this replica set member.

Connect to the mongod instance.
Use rs.initiate() to initiate the new replica set.

Add Members to the Replica Set – MongoDB provides two options for restoring secondary members of a replica set:

Manually copy the database files to each data directory.
Allow initial sync to distribute data automatically.

If your database is large, initial sync can take a long time to complete. For large databases, it might be preferable to copy the database files onto each host. The following sections outline both approaches.

Copy Database Files and Restart mongod Instance – Use the following sequence of operations to “seed” additional members of the replica set with the restored data by copying MongoDB data files directly.

Shut down the mongod instance that you restored. Using –shutdown or db.shutdownServer() to ensure a clean shut down.
Copy the primary’s data directory into the dbPath of the other members of the replica set. The dbPath is /data/db by default.
Start the mongod instance that you restored.
In a mongo shell connected to the primary, add the secondaries to the replica set using rs.add().

Update Secondaries using Initial Sync – Use the following sequence of operations to “seed” additional members of the replica set with the restored data using the default initial sync operation.

Ensure that the data directories on the prospective replica set members are empty.
Add each prospective member to the replica set. Initial Sync will copy the data from the primary to the other members of the replica set.

Back Up and Restore with MongoDB Tools – This document describes the process for writing and restoring backups to files in binary format with the mongodump and mongorestore tools. Use these tools for backups if other backup methods, such as the MMS Backup Service or file system snapshots are unavailable.

Backup a Database with mongodump – mongodump does not dump the content of the local database. To backup all the databases in a cluster via mongodump, you should have the backup role. The backup role provides all the needed privileges for backing up all database. The role confers no additional access, in keeping with the policy of least privilege.

To backup a given database, you must have read access on the database. Several roles provide this access, including the backup role. To backup the system.profile collection in a database, you must have read access on certain system collections in the database. Several roles provide this access, including the clusterAdmin and dbAdmin roles.

Changed in version 2.6. To backup users and user-defined roles for a given database, you must have access to the admin database. MongoDB stores the user data and role definitions for all databases in the admin database. Specifically, to backup a given database’s users, you must have the find action on the admin database’s admin.system.users collection. The backup and userAdminAnyDatabase roles both provide this privilege.

To backup the user-defined roles on a database, you must have the find action on the admin database’s admin.system.roles collection. Both the backup and userAdminAnyDatabase roles provide this privilege.

Basic mongodump Operations – The mongodump utility can back up data by either:

connecting to a running mongod or mongos instance, or
accessing data files without an active instance.

The utility can create a backup for an entire server, database or collection, or can use a query to backup just part of a collection. When you run mongodump without any arguments, the command connects to the MongoDB instance on the local system (e.g. 127.0.0.1 or localhost) on port 27017 and creates a database backup named dump/ in the current directory. To backup data from a mongod or mongos instance running on the same machine and on the default port of 27017 use the following command:

mongodump

The data format used by mongodump from version 2.2 or later is incompatible with earlier versions of mongod. Do not use recent versions of mongodump to back up older data stores. To limit the amount of data included in the database dump, you can specify –db and –collection as options to the mongodump command. For example:

mongodump –dbpath /data/db/ –out /data/backup/

mongodump –host mongodb.example.net –port 27017

mongodump will write BSON files that hold a copy of data accessible via the mongod listening on port 27017 of the mongodb.example.net host.

mongodump –collection collection –db test

This command creates a dump of the collection named collection from the database test in a dump/ subdirectory of the current working directory.

Point in Time Operation Using Oplogs – Use the –oplog option with mongodump to collect the oplog entries to build a point-in-time snapshot of a database within a replica set. With –oplog, mongodump copies all the data from the source database as well as all of the oplog entries from the beginning of the backup procedure to until the backup procedure completes. This backup procedure, in conjunction with mongorestore –oplogReplay, allows you to restore a backup that reflects the specific moment in time that corresponds to when mongodump completed creating the dump file.

Create Backups Without a Running mongod Instance – If your MongoDB instance is not running, you can use the –dbpath option to specify the location to your MongoDB instance’s database files. mongodump reads from the data files directly with this operation. This locks the data directory to prevent conflicting writes. The mongod process must not be running or attached to these data files when you run mongodump in this configuration. Consider the following example of backup a MongoDB Instance Without a Running mongod Given a MongoDB instance that contains the customers, products, and suppliers databases, the following mongodump operation backs up the databases using the –dbpath option, which specifies the location of the database files on the host:

mongodump –dbpath /data -o dataout

The –out option allows you to specify the directory where mongodump will save the backup. mongodump creates a separate backup directory for each of the backed up databases: dataout/customers, dataout/products, and dataout/suppliers.

Create Backups from Non-Local mongod Instances – The –host and –port options for mongodump allow you to connect to and backup from a remote host. Consider the following example

mongodump –host mongodb1.example.net –port 3017 –username user –password pass –out /opt/backup/mongodump-2013-10-24

On any mongodump command you may, as above, specify username and password credentials to specify database authentication.

Restore a Database with mongorestore – Changed in version 2.6. To restore users and user-defined roles on a given database, you must have access to the admin database. MongoDB stores the user data and role definitions for all databases in the admin database. Specifically, to restore users to a given database, you must have the insert action on the admin database’s admin.system.users collection. The restore role provides this privilege. To restore user-defined roles to a database, you must have the insert action on the admin database’s admin.system.roles collection. The restore role provides this privilege.

Basic mongorestore Operations – The mongorestore utility restores a binary backup created by mongodump. By default, mongorestore looks for a database backup in the dump/ directory. The mongorestore utility can restore data either by:

connecting to a running mongod or mongos directly, or
writing to a set of MongoDB data files without use of a running mongod.

mongorestore can restore either an entire database backup or a subset of the backup. To use mongorestore to connect to an active mongod or mongos, use a command with the following prototype form:

mongorestore –port <port number> <path to the backup>

To use mongorestore to write to data files without using a running mongod, use a command with the following prototype form:

mongorestore –dbpath <database path> <path to the backup>

Consider the following example:

mongorestore dump-2013-10-25/

Here, mongorestore imports the database backup in the dump-2013-10-25 directory to the mongod instance running on the localhost interface.

Restore Point in Time Oplog Backup – If you created your database dump using the –oplog option to ensure a point-in-time snapshot, call mongorestore with the –oplogReplay option, as in the following example:

mongorestore –oplogReplay

You may also consider using the mongorestore –objcheck option to check the integrity of objects while inserting them into the database, or you may consider the mongorestore –drop option to drop each collection from the database before restoring from backups.

Restore a Subset of data from a Binary Database Dump – mongorestore also includes the ability to a filter to all input before inserting it into the new database. Consider the following example:

mongorestore –filter ‘{“field”: 1}’

Here, mongorestore only adds documents to the database from the dump located in the dump/ folder if the documents have a field name field that holds a value of 1. Enclose the filter in single quotes (e.g. ‘) to prevent the filter from interacting with your shell environment.

Restore Without a Running mongod – mongorestore can write data to MongoDB data files without needing to connect to a mongod directly. As an example to restore a Database Without a Running mongod and given a set of backed up databases in the /data/backup/ directory:

/data/backup/customers,

/data/backup/products, and

/data/backup/suppliers

The following mongorestore command restores the products database. The command uses the –dbpath option to specify the path to the MongoDB data files:

mongorestore –dbpath /data/db –journal /data/backup/products

The mongorestore imports the database backup in the /data/backup/products directory to the mongod instance that runs on the localhost interface. The mongorestore operation imports the backup even if the mongod is not running. The –journal option ensures that mongorestore records all operation in the durability journal. The journal prevents data file corruption if anything (e.g. power failure, disk failure, etc.) interrupts the restore operation.

Restore Backups to Non-Local mongod Instances – By default, mongorestore connects to a MongoDB instance running on the localhost interface (e.g. 127.0.0.1) and on the default port (27017). If you want to restore to a different host or port, use the –host and –port options.

Consider the following example:

mongorestore –host mongodb1.example.net –port 3017 –username user –password pass /opt/backup/mongodump-2013-10-24

As above, you may specify username and password connections if your mongod requires authentication.