How to Store Large Lists in MongoDB

By: Morpheus Data

TL;DR: When storing large lists in MongoDB, a common thought is to place items into an array. In most situations, arrays take up less disk space that objects with keys and values. However, the way MongoDB handles constantly growing arrays can cause performance problems over time.

Data Storage Decisions

When storing data, you may decide to store information that is regularly updated with new information. For example, you may store every post a user makes to an online forum. One way to do this would be to include an array in a document to store the content of each post, as in the following example:

 

An example MongoDB document with an array.

In most cases, this would seem like an excellent way of storing the data. In programming, arrays are often a very efficient means of storing related values—they tend to be lightning fast for both data storage and retrieval.

How MongoDB Handles Growing Arrays

In MongoDB, arrays work a little differently than they do in a programming language. There are several key points to consider when using arrays for storage in MongoDB:

  • Expansion: An array that expands often will also increase the size of its containing document. Rather than being rewritten, the document will instead be moved on the disk. In MongoDB, this type of movement tends to be slower, because it requires every index to also be updated.
  • Indexing: If an array field in MongoDB is indexed, then one document within the collection will be responsible for a distinct entry in that index for every single element it the array. This means that the indexing work required to insert or delete an indexed array is going to be like indexing the same number of documents as the number of elements within the array—a lot of additional work for the database.
  • BSON Format: Finding one or more elements at the end of a large array can take quite a long time, because the BSON data format used a linear memory scan to manipulate documents.

Addressing the Array Issues

One suggestion for alleviating these performance issues is to model the data differently, so that you do not simply have an ever-growing single array. An example of this is to use nested subdocuments for data storage like the following example shows:

 

Nested subdocuments used for data storage. Source: MongoSoup

This method improves performance by dramatically decreasing the amount of MongoDB storage space needed for the data, as shown in the following comparison:

Storage space required for several data models. Option 1 – plain array, Option 2: array with documents, Option 3: Document with subdocuments. Source: MongoSoup

As you can see, the storage space for the nested subdocuments (Option 3) was far less than the single array (Option 1).

Where to Get MongoDB

MongoDB is a great database for applications with large amounts of data that needs to be queried quickly. One way to set up a MongoDB database easily is to have it hosted remotely as a cloud service.

Using Morpheus, you can get MongoDB (and several other databases) as a service in the cloud. It runs on a high performance infrastructure with Solid State Drives, and also has an easy setup as well as automatic backups and replication. Why not open a free account today?