Tag-aware sharding allows DBAs to optimize the performance of their MongoDB databases by helping the balancer organize shards so that a collection’s data can be accessed quickly. You can apply tags based on how frequently the data is accessed, the physical location of the users or data center, and the amount of system memory the shard requires, among other data characteristics.
The performance of a multi-cluster MongoDB database is all about balance: each of the shards in a cluster should have the right amount of chunks, and each chunk should be comprised of related data. One of the best ways to achieve the right balance of chunks in a shard, and shards in a cluster, is by using tag-aware sharding.
The basics of tag-aware sharding are presented in a GitHub text file. When you run the balancer on a sharded collection, it migrates the collection’s chunks to the shard associated with a tag whose :term:’shard key’ range has an *upper* bound greater than the chunk’s *lower* bound. Chunks that violate the configured tag are moved to the appropriate shard.
In the real world, this relatively straightforward process can get complicated very quickly. The folks behind the Bugsnag web-monitoring tool found this out soon after applying tag-aware sharding to their MongoDB sharded cluster. Simon Maynard explains in an October 7, 2014, blog post that the company added tags for each of its sharded collections to address slow responses by its unsharded collections when the primary shard was getting a lot of hits.
Tags were applied only to Bugsnag’s large shards, which were used to store crashes; users’ collections were stored on a smaller machine with sufficient memory to hold the entire dataset. When old data was deleted, it left the shards out of balance because the balancing algorithm ignores the size of each chunk when it moves chunks across shards. While MongoDB 2.6 added a command that merges empty chunks with their neighbors, the process is manual. Maynard wrote a script to automate the process.
Maynard also wrote a script that resizes chunks that have become too large, and he explains how Bugsnag was able to optimize storage by eliminating orphan documents, removing chunks that were no longer necessary, and using shell commands to monitor shard distribution: db.collection.getShardDistribution(), db.stats(), and sh.status().
Tag-aware use cases: Archives, shard by location, shard to a specific server
The power and versatility of tag-aware sharding are highlighted in a November 5, 2014, post on the MongoDB blog by Francesca Krihely. For example, much of an organization’s data is rarely accessed, so storing that data on high-performance hardware is wasteful. You can use tag-aware sharding to assign tags to various storage tiers, apply a unique shard key range, and have the documents moved to the appropriate shard during balancing.
Similarly, it isn’t uncommon to want to store user information at a specific data-center location. The MongoDB Manual includes a tutorial on data-center awareness, but there are a few caveats that apply. For instance, since you can’t change the value of a shard key, you’ll have to delete and then reinsert the document for any user who changes location.
One of the most useful applications of tag-aware sharding is for memory optimization. Collections with heavy indexing can be tagged to a physical server that has sufficient RAM to accommodate those shards.
One of the most efficient ways to monitor and optimize heterogeneous MySQL, MongoDB, Redis, and ElasticSearch databases is via the new Morpheus Virtual Appliance, which seamlessly provisions and manages SQL, NoSQL, and in-memory databases across public, private, and hybrid clouds. Morpheus lets you bring up a new instance of a database in just seconds via a point-and-click interface.
A free full replica set is provisioned for each database instance, and your MySQL and Redis databases are backed up. Morpheus supports a range of database tools for connecting to, configuring, and managing your databases. Visit the Morpheus site to create a free account.