Sharded Cluster Maintenance

View Cluster Configuration

List Databases with Sharding Enabled – To list the databases that have sharding enabled, query the databases collection in the Config Database. A database has sharding enabled if the value of the partitioned field is true. Connect to a mongos instance with a mongo shell, and run the following operation to get a full list of databases with sharding enabled:

use config

db.databases.find( { “partitioned”: true } )

As an example, you can use the following sequence of commands when to return a list of all databases in the cluster:

use config

db.databases.find()

If this returns the following result set:

{ “_id” : “admin”, “partitioned” : false, “primary” : “config” }

{ “_id” : “animals”, “partitioned” : true, “primary” : “m0.example.net:30001” }

{ “_id” : “farms”, “partitioned” : false, “primary” : “m1.example2.net:27017” }

Then sharding is only enabled for the animals database.

List Shards – To list the current set of configured shards, use the listShards command, as follows:

use admin

db.runCommand( { listShards : 1 } )

View Cluster Details – To view cluster details, issue db.printShardingStatus() or sh.status(). Both methods return the same output. As an example, in the following example output from sh.status()

sharding version displays the version number of the shard metadata.
shards displays a list of the mongod instances used as shards in the cluster.
databases displays all databases in the cluster, including database that do not have sharding enabled.
The chunks information for the foo database displays how many chunks are on each shard and displays the range of each chunk.

— Sharding Status —

sharding version: { “_id” : 1, “version” : 3 }

shards:

{ “_id” : “shard0000”, “host” : “m0.example.net:30001” }

{ “_id” : “shard0001”, “host” : “m3.example2.net:50000” }

databases:

{ “_id” : “admin”, “partitioned” : false, “primary” : “config” }

{ “_id” : “contacts”, “partitioned” : true, “primary” : “shard0000” }

foo.contacts

shard key: { “zip” : 1 }

chunks:

shard0001 2

shard0002 3

shard0000 2

{ “zip” : { “$minKey” : 1 } } –>> { “zip” : “56000” } on : shard0001 { “t” : 2, “i” : 0 }

{ “zip” : 56000 } –>> { “zip” : “56800” } on : shard0002 { “t” : 3, “i” : 4 }

{ “zip” : 56800 } –>> { “zip” : “57088” } on : shard0002 { “t” : 4, “i” : 2 }

{ “zip” : 57088 } –>> { “zip” : “57500” } on : shard0002 { “t” : 4, “i” : 3 }

{ “zip” : 57500 } –>> { “zip” : “58140” } on : shard0001 { “t” : 4, “i” : 0 }

{ “zip” : 58140 } –>> { “zip” : “59000” } on : shard0000 { “t” : 4, “i” : 1 }

{ “zip” : 59000 } –>> { “zip” : { “$maxKey” : 1 } } on : shard0000 { “t” : 3, “i” : 3 }

{ “_id” : “test”, “partitioned” : false, “primary” : “shard0000” }

Replace a Config Server – This procedure replaces an inoperable config server in a sharded cluster. Use this procedure only to replace a config server that has become inoperable (e.g. hardware failure). This process assumes that the hostname of the instance will not change. If you must change the hostname of the instance, use the procedure to migrate a config server and use a new hostname.

Disable the cluster balancer process temporarily.
Provision a new system, with the same hostname as the previous host. You will have to ensure that the new system has the same IP address and hostname as the system it’s replacing or you will need to modify the DNS records and wait for them to propagate.
Shut down one (and only one) of the existing config servers. Copy all of this host’s dbPath file system tree from the current system to the system that will provide the new config server. This command, issued on the system with the data files, may resemble the following:

rsync -az /data/configdb mongodb.config2.example.net:/data/configdb

Restart the config server process that you used in the previous step to copy the data files to the new config server instance.
Start the new config server instance. The default invocation is:

mongod –configsvr

Re-enable the balancer to allow the cluster to resume normal balancing operations.

Backup Cluster Metadata – This procedure shuts down the mongod instance of a config server in order to create a backup of a sharded cluster’s metadata. The cluster’s config servers store all of the cluster’s metadata, most importantly the mapping from chunks to shards. When you perform this procedure, the cluster remains operational.

Disable the cluster balancer process temporarily.
Shut down one of the config databases.
Create a full copy of the data files (i.e. the path specified by the dbPath option for the config instance.)
Restart the original configuration server.
Re-enable the balancer to allow the cluster to resume normal balancing operations.

Manage Sharded Cluster Balancer – This section describes common administrative procedures related to balancing.

Check the Balancer State – The following command checks if the balancer is enabled (i.e. that the balancer is allowed to run). The command does not check if the balancer is active (i.e. if it is actively balancing chunks). To see if the balancer is enabled in your cluster, issue the following command, which returns a boolean:

sh.getBalancerState()

Check the Balancer Lock – To see if the balancer process is active in your cluster, do the following:

Connect to any mongos in the cluster using the mongo shell.
Issue the following command to switch to the Config Database: use config
Use the following query to return the balancer lock:

db.locks.find( { _id : “balancer” } ).pretty()

When this command returns, you will see output like the following:

{ “_id” : “balancer”,

“process” : “mongos0.example.net:1292810611:1804289383”,

“state” : 2,

“ts” : ObjectId(“4d0f872630c42d1978be8a2e”),

“when” : “Mon Dec 20 2010 11:41:10 GMT-0500 (EST)”,

“who” : “mongos0.example.net:1292810611:1804289383:Balancer:846930886”,

“why” : “doing balance round” }

This output confirms that:

The balancer originates from the mongos running on the system with the hostname mongos0.example.net.
The value in the state field indicates that a mongos has the lock. For version 2.0 and later, the value of an active lock is 2; for earlier versions the value is 1.

Schedule the Balancing Window – In some situations, particularly when your data set grows slowly and a migration can impact performance, it’s useful to be able to ensure that the balancer is active only at certain times. Use the following procedure to specify a window during which the balancer will be able to migrate chunks:

Connect to any mongos in the cluster using the mongo shell.
Issue the following command to switch to the Config Database: use config
Use an operation modeled on the following example update() operation to modify the balancer’s window:

db.settings.update({ _id : “balancer” }, { $set : { activeWindow : { start : “<start-time>”, stop : “<stop-time>” } } }, true )

Replace <start-time> and <end-time> with time values using two digit hour and minute values (e.g HH:MM) that describe the beginning and end boundaries of the balancing window. These times will be evaluated relative to the time zone of each individual mongos instance in the sharded cluster. If your mongos instances are physically located in different time zones, use a common time zone (e.g. GMT) to ensure that the balancer window is interpreted correctly. For instance, running the following will force the balancer to run between 11PM and 6AM local time only:

db.settings.update({ _id : “balancer” }, { $set : { activeWindow : { start : “23:00”, stop : “6:00” } } }, true )

The balancer window must be sufficient to complete the migration of all data inserted during the day. As data insert rates can change based on activity and usage patterns, it is important to ensure that the balancing window you select will be sufficient to support the needs of your deployment.

Remove a Balancing Window Schedule – If you have set the balancing window and wish to remove the schedule so that the balancer is always running, issue the following sequence of operations:

use config

db.settings.update({ _id : “balancer” }, { $unset : { activeWindow : true } })

Disable the Balancer – By default the balancer may run at any time and only moves chunks as needed. To disable the balancer for a short period of time and prevent all migration, use the following procedure:

Connect to any mongos in the cluster using the mongo shell.
Issue the following operation to disable the balancer: setBalancerState(false)

If a migration is in progress, the system will complete the in-progress migration before stopping.

To verify that the balancer has stopped, issue the following command, which returns false if the balancer is stopped:

sh.getBalancerState()

Optionally, to verify no migrations are in progress after disabling, issue the following operation in the mongo shell:

use config

while( sh.isBalancerRunning() ) {

print(“waiting…”);

sleep(1000);

}

To disable the balancer from a driver that does not have the sh.startBalancer() helper, issue the following command from the config database:

db.settings.update( { _id: “balancer” }, { $set : { stopped: true } } , true )

Enable the Balancer – Use this procedure if you have disabled the balancer and are ready to re-enable it:

Connect to any mongos in the cluster using the mongo shell.
Issue one of the following operations to enable the balancer: setBalancerState(true)

From a driver that does not have the sh.startBalancer() helper, issue the following from the config database:

db.settings.update( { _id: “balancer” }, { $set : { stopped: false } } , true )

Disable Balancing During Backups – If MongoDB migrates a chunk during a backup, you can end with an inconsistent snapshot of your sharded cluster. Never run a backup while the balancer is active. To ensure that the balancer is inactive during your backup operation:

Set the balancing window so that the balancer is inactive during the backup. Ensure that the backup can complete while you have the balancer disabled.
manually disable the balancer for the duration of the backup procedure.

If you turn the balancer off while it is in the middle of a balancing round, the shut down is not instantaneous. The balancer completes the chunk move in-progress and then ceases all further balancing rounds. Before starting a backup operation, confirm that the balancer is not active. You can use the following command to determine if the balancer is active:

!sh.getBalancerState() && !sh.isBalancerRunning()

When the backup procedure is complete you can reactivate the balancer process.

Remove Shards from an Existing Sharded Cluster – To remove a shard you must ensure the shard’s data is migrated to the remaining shards in the cluster. This procedure describes how to safely migrate data and how to remove a shard. This procedure describes how to safely remove a single shard. Do not use this procedure to migrate an entire cluster to new hardware. To migrate an entire shard to new hardware, migrate individual shards as if they were independent replica sets.

To remove a shard, first connect to one of the cluster’s mongos instances using mongo shell. Then use the sequence of tasks in this document to remove a shard from the cluster.

Ensure the Balancer Process is Enabled – To successfully migrate data from a shard, the balancer process must be enabled. Check the balancer state using the sh.getBalancerState() helper in the mongo shell.

Determine the Name of the Shard to Remove – To determine the name of the shard, connect to a mongos instance with the mongo shell and either:

Use the listShards command, as in the following: adminCommand( { listShards: 1 } )
Run either the sh.status() or the db.printShardingStatus() method.

The shards._id field lists the name of each shard.

Remove Chunks from the Shard – From the admin database, run the removeShard command. This begins “draining” chunks from the shard you are removing to other shards in the cluster. For example, for a shard named mongodb0, run:

use admin

db.runCommand( { removeShard: “mongodb0” } )

This operation returns immediately, with the following response:

{

“msg” : “draining started successfully”,

“state” : “started”,

“shard” : “mongodb0”,

“ok” : 1

}

Depending on your network capacity and the amount of data, this operation can take from a few minutes to several days to complete.

Check the Status of the Migration – To check the progress of the migration at any stage in the process, run removeShard from the admin database again. For example, for a shard named mongodb0, run:

use admin

db.runCommand( { removeShard: “mongodb0” } )

The command returns output similar to the following:

{ “msg” : “draining ongoing”,

“state” : “ongoing”,

“remaining” : {

“chunks” : 42,

“dbs” : 1

“ok” : 1

}

In the output, the remaining document displays the remaining number of chunks that MongoDB must migrate to other shards and the number of MongoDB databases that have “primary” status on this shard. Continue checking the status of the removeShard command until the number of chunks remaining is 0. Always run the command on the admin database. If you are on a database other than admin, you can use sh._adminCommand to run the command on admin.

Move Unsharded Data – If the shard is the primary shard for one or more databases in the cluster, then the shard will have unsharded data. If the shard is not the primary shard for any databases, skip to the next task, Finalize the Migration. In a cluster, a database with unsharded collections stores those collections only on a single shard. That shard becomes the primary shard for that database. (Different databases in a cluster can have different primary shards.) Do not perform this procedure until you have finished draining the shard.

To determine if the shard you are removing is the primary shard for any of the cluster’s databases, issue one of the following methods:

sh.status()

db.printShardingStatus()

In the resulting document, the databases field lists each database and its primary shard. For example, the following database field shows that the products database uses mongodb0 as the primary shard:

{ “_id” : “products”, “partitioned” : true, “primary” : “mongodb0” }

To move a database to another shard, use the movePrimary command. For example, to migrate all remaining unsharded data from mongodb0 to mongodb1, issue the following command:

db.runCommand( { movePrimary: “products”, to: “mongodb1” })

This command does not return until MongoDB completes moving all data, which may take a long time. The response from this command will resemble the following:

{ “primary” : “mongodb1”, “ok” : 1 }

Finalize the Migration – To clean up all metadata information and finalize the removal, run removeShard again. For example, for a shard named mongodb0, run:

use admin

db.runCommand( { removeShard: “mongodb0” } )

A success message appears at completion:

{ “msg” : “removeshard completed successfully”,

“state” : “completed”,

“shard” : “mongodb0”,

“ok” : 1

}

Once the value of the state field is “completed”, you may safely stop the processes comprising the mongodb0 shard.